Our Genomes, Unzipped by Daniel MacArthur (Something to watch and to observe, HIPAA and Privacy Act need an update. Personal Genomes Project.)

When we launched this website back in June, I welcomed readers with a promise that Genomes Unzipped would “ultimately be much more than just a group blog”. Indeed, the last four months of blogging have really just been a prelude of sorts to what comes next: the realGenomes Unzipped.

Today we’re launching an exciting new phase of the project. Although we’re not entirely sure where this journey will take us, we’re looking forward to finding out – and to bringing you along with us.

What are we doing?

Over the last year, all the members of Genomes Unzipped have had genome scans performed by personal genomics company 23andMe; several of us have also had additional tests done by other genetic testing companies (CounsyldeCODEme). From today, we’ll be making all of our raw genetic data and the reports generated from these tests freely available online. As the project proceeds, we aim to obtain data from an ever larger array of tests – ultimately extending to whole-genome sequencing – and release it openly. Right now you can freely download the 23andMe data from everyone in the project from this website.

Over the next few weeks, each of the members will be writing about their own experiences with genetic testing, and what they’ve learnt from their own genetic data. We’ll be discussing analyses we’ve performed on our own raw data, using software written both by group members and other collaborators; and we’ll be releasing the code for that software in our new code repository. We’ll also be talking about the process of deciding to release our genetic data publicly, and how we discussed this decision with our families.

To make it easier for us (and you) to explore our genomes, we have assembled a custom genome browser using JBrowse – this provides a visual interface that allows our 23andMe (and later, complete sequence) data to be viewed in the context of genes and other features. It’s still in prototype form, but we’ll be refining it and adding more data as the project proceeds.

Why are we doing this?

When I first started thinking about a new group blog back in late 2009, the idea was fairly simple: put together a group of people who were experts in fields related to personal genomics, help them get access to their own genetic data, and create a platform for them to talk about what they found. I quickly joined forces with Luke and we refined the idea further.

As we discussed the notion of a group of experts analysing their own genomes, one thing rapidly became clear: for maximum public benefit the analyses had to be open and reproducible, and that meant making the underlying data public. In other words, for this to work, members of the group had to be ready to spill their genetic secrets to the world.

In September 2009 I had an opportunity to purchase a sizeable number of kits from one testing company (23andMe) at a discount, and quickly contacted a group of the smartest people I knew in genomics – all initially based in the Cambridge area – to take advantage of the offer. Thus the project was born; fittingly, our first meeting was held over pints of ale inThe Eagle, the pub frequented by Watson and Crick during their early work on the structure of DNA.

Initially there was some discussion about various models for partial anonymity – not linking people’s names to their data, or allowing people to write under a pseudonym, for instance. However, given there was no way for us to guarantee that we could protect the identity of the participants once their data were released, we decided that the only viable solution was for members to write under their own full names from the outset, and to have their genetic data transparently linked to their identity.

Remarkably, despite being given every opportunity to change their minds, nearly everyone I approached to join the group still decided to go ahead with the decision to share their genetic data online. We all made that decision for a variety of reasons, but there are some common threads:

  • we want to share the results of scientific analysis of our own genomes, and as proponents of open data access most of us believe that doing good science means releasing complete data for others to investigate;
  • we hope that releasing our data publicly will help to guide useful discussions about genetic privacy and the benefits, risks and limitations of genetic information in general;
  • many of us believe that the ideal resource for genetic research is large open-access, non-anonymous research databases such as the Personal Genome Project, and thatsharing linked genetic and trait information openly with the wider community is a public good – and we hope that our own experiences will encourage others to participate in open research projects;
  • we all believe that many of the fears expressed about the dangers of genetic information are exaggerated, and see this project as an opportunity to have a constructive public discussion about the truth behind these fears;
  • given the ease with which a dedicated snoop could obtain genetic information surreptitiously (via shed skin, hair or saliva, for instance), some of us argue that the whole notion of genetic privacy is illusory anyway – while releasing our data online makes it easier for people to get hold of it, this is a difference of degree rather than kind.

What about the risks?

We’re going into this process with our eyes wide open. Everyone in the group has a sound background knowledge of genetics: we know the sorts of things that can be found in a genome, and what such discoveries can mean for individuals and their families. However, like others willing to share their genetic data – such as the participants in the Personal Genome Project (PGP) – we simply feel that the potential benefits of this project outweigh its potential harms.

To ensure that everyone in the group is making a fully informed decision, we’ve put togethera lengthy informed consent document (PDF; modified from the consent forms used for the PGP) that lays out the risks and issues involved in disclosing genetic information publicly. This document explains exactly what we’re putting on the line here: anyone in the world can now access our genetic data and infer information about our disease risks and our genetic relationships with other people, and it’s possible to imagine all sorts of ways in which that knowledge could be abused.

It also explains that these risks could also apply to our families: our published data could be used, for instance, to infer the risk of serious disease in our parents, siblings or children. We have encouraged all of the members of the project to discuss these issues with their first-degree relatives to ensure they are as fully aware of the potential risks as possible.

Finally, the document points out that there is the class of unknown unknowns: risks that no-one knows about yet. To the best of our ability, the members of the group have weighed up all of these uncertainties and decided to go ahead.

We can’t fully predict what the future holds, but there are good reasons to be optimistic that the risks of disclosing genetic information will be minimal. As the passing of GINA in the US shows, in Western countries there is strong public opposition to the idea of unfair discrimination against individuals on the basis of genetic information. While this opposition hasn’t yet been codified into law outside the US, there’s every reason to expect that individuals who try to abuse genetic information will ultimately pay a high legal or social price.

As a group we expect that our genomes will be joined by many, many others in the public domain over the next few years. As the sheer power of open databases of genetic and medical data becomes clear, we anticipate that participating in such studies will be increasingly viewed as something of a moral imperative. Already there are over 10,000 individuals signed up for disclosure of their genetic data as part of the Personal Genomes Project, and that number is growing fast.

As we move towards a world where thousands of people release their genomes into the public domain, someone has to be the guinea pig. We take comfort from the fact that others have already paved the way for this project: Craig Venter, James Watson, and the members of the PGP-10, for instance. Like them, we feel we are well-equipped with the knowledge required to respond to any serious consequences that arise as a result of genetic disclosure. We hope that our experiences and those of other early disclosers will provide valuable lessons for those who follow.

What next?

Over the next few weeks we’ll be discussing the ways in which we’ve peered into our own genetic data, and providing you with some of the tools and background knowledge you would need to do the same. You’ll have a chance to ask questions to people who work with genetic information for a living about what they’ve gleaned from their own genomes.

Moving forward, we hope that we can use our own data as a resource for developing new tools for analysing personal genetic data. In addition to the data of core group members, we plan to host data from others who are also willing to share their genomes. We will also bereleasing the software for the analyses we perform here for others to use and modify, and will welcome submission of other people’s programs to the GNZ code repository. Ultimately, we hope that we can become a hub for a diverse community of people interested in building and using tools for exploring their own DNA.

We will continue to explore the personal genomics marketplace, obtaining and reviewing new products and services as resources permit. For the time being we are a collection of like-minded individuals with limited funding (if you’d like to help remedy that, please let us know). Our commitment to openness extends to our relationships with personal genomics companies and with our funders: our disclosures page will contain full information about how we have obtained personal genomics products and services and from whom we have obtained funding.

We will also be posting invited commentary from external experts in genomics, bio-ethics, philosophy and law about the issues surrounding open genetic data release, and around genetic testing in general. And importantly, we’ll be looking for help from you, our readers: by participating in discussion, suggesting new analyses, testing the software and resources we describe here, and contributing your own tools, you can help build the dynamic community we want this website to become.


This post is already pretty long, but I couldn’t finish without broadcasting sincere thanks to a number of people for helping the project get to this stage.

Firstly, to all the members of Genomes Unzipped, most of all to Luke, who has worked tirelessly on every aspect of this project, and is almost single-handedly responsible for getting the website up and running; also particularly to Dan, who has made sure we steered clear of legal and ethical minefields and drafted the project’s informed consent form; Kate, who worked hard on the project’s internal site and designed the website banner; Caroline, always quick with practical feedback and advice; and my wife Ilana, for many hours of discussion about the project’s goals and logistics, and for agreeing to contribute her own genetic data. Special thanks also to Joe for working with Luke on the genome browser.

Outside the project, we were fortunate to receive guidance from many wise individuals. We are extremely grateful to Zoe McDougall from Oxford Nanopore for her incredibly useful advice on many diverse topics over the last year. We are also indebted to Mark Henderson from the Times for many useful discussions, and to David Hooper from Reynolds Porter Chamberlain LLP and Alison Hall from the PHG Foundation for informal legal advice. We’d also like to thank the PHG Foundation for their generous grant of £600 to build and maintain the website.

Finally, thanks to the readers who’ve joined us over the last four months while we honed our blogging skills, sharpened up the website and prepared for phase 2. We’ve enjoyed the discussions we’ve had with you – and we look forward to even more fruitful discussion as the project moves into this new era.