2 March 2018
David Altshuler is Executive Vice President, Global Research and Chief Scientific Officer at Vertex Pharmaceuticals and was the founding Chair of the Global Alliance for Genomics and Health. He was also a founding member of the Broad Institute of MIT and Harvard and played a leading role in the foundational genomic variation projects of the 2000s — the SNP Consortium, HapMap, and 1,000 Genomes Projects — as well as leading disease-specific genetics consortia in type 2 diabetes and heart disease. This work led to the early realization that an international body for harmonizing data sharing efforts was critical for genomic research and medicine to fulfill their promise. Dr. Altshuler serves on the GA4GH Strategic Advisory Board and was named a Champion of Change by the White House for his leadership in creating and leading GA4GH. As the organization ramps up its next phase, GA4GH Connect, we asked him to discuss its history and the need to enable responsible genomic data sharing — a need he says is even more relevant today than it was five years ago.
GA4GH News: What were some of the initial motivations for establishing the Global Alliance for Genomics and Health and how has this world changed since then?
David Altshuler: When I started in human genetics in the late 1990’s, a human genetic study would typically have a hundred patients, and query a single genetic variant. By 2012, studies of genomic variation and disease included millions of genetic variants, and were growing in scale from thousands and tens of thousands of people to hundreds of thousands; we could see that soon, millions of people would be contributing their genome data to inform science and medicine. But, it also had become clear that we lacked — and would continue to lack for the foreseeable future — he ability to simply read a genome and interpret its variation in terms of disease or health. Instead, the approach was (and remains) to compare the genomes of many different people, and to see which variants track with which diseases in the population. In other words, the genome is not a book we know how to read — we can’t say if a given A is changed to a T, then the person's is more likely to get cancer. Rather, we look at the genomes of many people with and without cancer, and use that filter to determine which variants track to which diseases. Finally, we realized that no one organization was ever going to have enough data to really do that justice — what was needed to make progress was actually to look across all the data around the globe.
This prediction is more convincing today than it was five years ago. The justification is much more clear and tangible — the collection of data, and use in medicine, are happening even faster than we expected — the need to improve interpretation is ever more pressing. Whether it's a common variant where the effect is modest, and we need large numbers of people to obtain a precise estimate, or a rare variant where having millions of people gives you more observations of an uncommon event, more data does lead to better insights. Even large national programs such as UK Biobank, the All of Us Research Program, and private databases lack the power and diversity of a world-wide effort. People are finding that if we look across the datasets, we learn more than we could with one alone. But, that can only be done if there are reliable, responsible and effective means to share data and learnings across studies.
GA4GH News: Why did you choose the federated approach rather than create a single operating entity that would generate, analyse, and hold all the data in one place?
DA: The idea of a federated network was modeled on the World Wide Web Consortium (W3C), and was actually very different than what had come before in genomics and health. In genomics, each project was a “one-off”, and had to solve every problem for itself. The W3C is very different beast – it’s a set of exchange standards that allow parties to both communicate with each other and to innovate in a shared environment. That is, the W3C isn’t Wikipedia or Facebook — it's the exchange format that allows everyone to share information and build such tools. This was something that I felt very passionate about — we had to figure out how to ensure different parties around the world could communicate and share with one another in a trusted manner, but also compete and innovate on how best to generate, store, and interpret the data. If we could do this right and establish an effective ecosystem, then everyone would have an incentive to participate. This is in contrast to trying to create “the great database in the sky” or an uber-solution. If one solution is selected, you immediately stifle innovation.
I would say that getting people to think in terms of a network was the heaviest lift, to get people to understand and agree that it was worth doing. In the beginning, people would say "well I'm already doing that." But what they meant was that they were collecting data, analysing it, and writing papers. They even were figuring out how to talk to a collaborator. But, they weren’t establishing reliable technical and ethical standards to generate, store, manage, and make available data. The real question was how do we talk to each other and what are the rules of engagement?
Personally, I strongly believe that the approach being taken now by GA4GH is the right one. The GA4GH continues to address technical and regulatory and security issues — those are the horizontals. But it’s all in the context of real-world Driver Projects, which have the need and are already trying to solve these problems on their own. They need to be inside the tent because they're finding problems, they are proposing solutions, they're innovating. The trick is to take the best practices, disseminate them, and help everyone have the option, if they choose, of exchanging information to inform understanding of health and disease.
GA4GH News: How has GA4GH played a role in "moving the needle" on data sharing?
DA: First of all, GA4GH has raised the questions, created the dialogue, and catalyzed interactions between real-world projects and leaders in technology and ethics. Secondly, it is proposing a framework for addressing the issues and providing concrete valuable documents and, in some cases, tools that the community can use and evolve. I've been very impressed by how some of the regulatory and ethics components have gotten traction. They incorporate best practices, are practical, and clear — and they save every party from having to reinvent the wheel themselves.
Here’s a another example: there's a file format (SAM/BAM) that was developed in the context of the 1,000 Genomes Project, and that has become ubiquitous for storing next generation sequence information. It was originally developed by a specific collaboration between people in that project, but then became ubiquitous without any forward-looking governance. It simply didn't belong to anyone! It was felt that no one organization should own it because it was used by the whole field, but when GA4GH took it on, everyone was thrilled.
It’s worth stepping back and asking, "why is genomics in the place it is now?” I think that one of the main reasons was a really wonderful mix of collaborative projects such as the Human Genome Project, HapMap, TCGA, that brought people together, and yet competition and innovation across the projects and institutions. There’s a healthy tension between pulling people together to share best practice and learning and, on the other side, innovation and drive and entrepreneurial energy — that is a very good thing for an ecosystem. We carefully crafted the GA4GH mission to help with the sharing of best practice and interoperability, but in such a way as to trigger entrepreneurial energy and competition to find the best solution.
GA4GH News: Why is genomic and health-related data sharing important for human health and medicine — why is it imperative for us to get this right?
DA: There's two things I’d say about the sharing of data. The first is that the right to self determination is very important — it’s the right of the individual to participate or not and to have an understanding of and control over what happens to their own personal information. It is certainly not a universal standard but it’s one that the founders of this organization believe in — so that's the first thing: if someone chooses to participate, that's great, and if they chose not to, that's fine too.
But having said that, we believe that understanding human health at a mechanistic level and a biological level can inform diagnostics and therapeutics. By providing better quality information to help people prevent and treat disease, then genomic data is clearly valuable — we've seen that over and over again. One example is in the medicines we develop at Vertex for cystic fibrosis — those medicines come directly from an understanding of the genetic basis of the disease and directly address that underlying cause. Without understanding how these genetic variants — both common and rare — contribute to this devastating disease, none of the medicines we now have available for patients would exist.
The other example I always like to cite is the BRCA Challenge. It’s been known for some time that BRCA mutations play an important role in breast cancer and ovarian cancer. There are tests done every day around the world. But if you have such a mutation, answering the question, “what does a given mutation mean to the patient’s risk?”, is beyond the scale of any single organization. Mutations are large in number and each individually rare, and so no one organization has enough information to really answer the question. The BRCA Challenge is an effort launched by GA4GH to provide women and families and caregivers better estimates of what their particular information means.
To make possible such insights across all of medicine and all the different diseases and all the populations of the world — this is a task that is beyond any single organization. We share the mission of advancing the day when we people can benefit from accurate, interpretable genetic diagnostics, insights into the causes of human disease, and the development of new therapies. Having appropriate standards — ethical, regulatory, and technical — to exchange information and to learn is a necessary condition for success. It's not sufficient, but it's a very important step forward.