Learn how GA4GH helps expand responsible genomic data use to benefit human health.
Learn how GA4GH helps expand responsible genomic data use to benefit human health.
Our Strategic Road Map defines strategies, standards, and policy frameworks to support responsible global use of genomic and related health data.
Discover how a meeting of 50 leaders in genomics and medicine led to an alliance uniting more than 5,000 individuals and organisations to benefit human health.
GA4GH Inc. is a not-for-profit organisation that supports the global GA4GH community.
To guide our collaborative, globe-spanning alliance, GA4GH relies on a Standards Steering Committee and an Executive Committee.
The Funders Forum brings together organisations that offer both financial support and strategic guidance.
The EDI Advisory Group responds to issues raised in the GA4GH community, finding equitable, inclusive ways to build products that benefit diverse groups.
Distributed across four Host Institutions, our staff team supports the mission and operations of GA4GH.
Curious who we are? Meet the people and organisations across six continents who make up GA4GH.
More than 500 organisations connected to genomics — in healthcare, research, patient advocacy, industry, and beyond — have signed onto the mission and vision of GA4GH as Organisational Members.
These core Organisational Members are genomic data initiatives that have committed resources to guide GA4GH work and pilot our products.
This subset of Organisational Members whose networks or infrastructure align with GA4GH priorities has made a long-term commitment to engaging with our community.
Local and national organisations assign experts to spend at least 30% of their time building GA4GH products.
Anyone working in genomics and related fields is invited to participate in our inclusive community by creating and using new products.
Wondering what GA4GH does? Learn how we find and overcome challenges to expanding responsible genomic data use for the benefit of human health.
Study Groups define needs. Participants survey the landscape of the genomics and health community and determine whether GA4GH can help.
Work Streams create products. Community members join together to develop technical standards, policy frameworks, and policy tools that overcome hurdles to international genomic data use.
GIF solves problems. Organisations in the forum pilot GA4GH products in real-world situations. Along the way, they troubleshoot products, suggest updates, and flag additional needs.
NIF finds challenges and opportunities in genomics at a global scale. National programmes meet to share best practices, avoid incompatabilities, and help translate genomics into benefits for human health.
Communities of Interest find challenges and opportunities in areas such as rare disease, cancer, and infectious disease. Participants pinpoint real-world problems that would benefit from broad data use.
See all our products — always free and open-source. Do you work on cloud genomics, data discovery, user access, data security or regulatory policy and ethics? Need to represent genomic, phenotypic, or clinical data? We’ve got a solution for you.
All GA4GH standards, frameworks, and tools follow the Product Development and Approval Process before being officially adopted.
Learn how other organisations have implemented GA4GH products to solve real-world problems.
Help us transform the future of genomic data use! See how GA4GH can benefit you — whether you’re using our products, writing our standards, subscribing to a newsletter, or more.
Help create new global standards and frameworks for responsible genomic data use.
Align your organisation with the GA4GH mission and vision.
Solve your real-world data problems with support from this valuable network of global institutions.
Work with like-minded groups committed to better data use in areas like rare disease, cancer, and infectious disease.
Share your thoughts on all GA4GH products currently open for public comment.
Solve real problems by aligning your organisation with the world’s genomics standards. We offer software dvelopers both customisable and out-of-the-box solutions to help you get started.
Learn more about upcoming GA4GH events. See reports and recordings from our past events.
Speak directly to the global genomics and health community while supporting GA4GH strategy.
Be the first to hear about the latest GA4GH products, upcoming meetings, new initiatives, and more.
Questions? We would love to hear from you.
Read news, stories, and insights from the forefront of genomic and clinical data use.
Attend an upcoming GA4GH event, or view meeting reports from past events.
See new projects, updates, and calls for support from the Work Streams.
Read academic papers coauthored by GA4GH contributors.
Listen to our podcast OmicsXchange, featuring discussions from leaders in the world of genomics, health, and data sharing.
Check out our videos, then subscribe to our YouTube channel for more content.
View the latest GA4GH updates, Genomics and Health News, Implementation Notes, GDPR Briefs, and more.
Discover all things GA4GH: explore our news, events, videos, podcasts, announcements, publications, and newsletters.
12 May 2020
The urgency of scientific data sharing is never more apparent than during a global disease outbreak. Today we hear from Mark Fiume, CEO of DNAstack and Co-Lead of the GA4GH Discovery Work Stream, about the COVID-19 Beacon, an initiative aimed at making viral genomic datasets discoverable for investigators around the world.
Angela Page: Welcome to the OmicsXchange. I’m Angela Page. The urgency of scientific data sharing is never more apparent than during a global disease outbreak. This episode marks the first in a series of conversations on the role of data sharing during the COVID-19 pandemic. We’ll spend the next few weeks speaking with members of the international genomics community about new initiatives that leverage collaboration, interoperability, and open science to advance research into the novel coronavirus. Today we hear from Mark Fiume, CEO of DNAstack and co-lead of the GA4GH Discovery Work Stream about the COVID-19 Beacon, an initiative aimed at making viral genomic datasets discoverable for investigators around the world. Welcome, Marc.
Marc Fiume: Thank you so much for having me. It’s wonderful to speak about open data sharing in a time where the world needs it most.
Angela Page: So tell us a little bit about the COVID-19 Beacon. What is the focus of this project?
Marc Fiume: The COVID-19 Beacon is a search engine across publicly available virus genomes that visualizes for a given mutation, the geographic and evolutionary origins of that sequence. We’re updating the beacon nightly to include more genome sequences as they’re published around the world, and they’re at over 5,600 genome sequences. The COVID-19 Beacon builds upon the latest version of the Beacon specification being led by Jordi Rambla at ELIXIR, and a team of international collaborators within GA4GH and specifically the Discovery Work Stream.
Angela Page: And why is this important during a global pandemic, in a time where so many regions of the world are in crisis?
Marc Fiume: I think one of the really interesting observations of COVID-19 and our response to it is that it kind of reminds you of what your priorities are. At home you call your parents and hug your kids. And at work, it has really helped us to recognize the order of our priorities when we approach science collaboratively. I think it’s been really great to see open science have increased priority over some other things that could potentially block data sharing like attribution, for example. So it’s really, really important that we share data as real time as we can, nationally and internationally, to increase the volume and diversity of information that scientists have to look at the virus and the host that it’s affecting.
Angela Page: How did the project get off the ground?
Marc Fiume: As things were escalating with COVID-19 outbreak, we thought, you know, GA4GH develops a lot of standards for open data sharing and my group focuses primarily on data discovery—so essentially creating search engines for biomedical data, whether it’s for genome types or moving into the space of phenotypic and other metadata. And so Beacon is the simplest. It’s sort of this genomic “Go Fish,” where you can ask a Beacon, “Do you have information about this variant yes or no?” And optionally, a data provider can tell you more. So in the context of human genetics, you might say, here are the phenotypes associated with patients who have this variant, breast cancer, for example. But the standard is agnostic to what the payload or what that metadata can include. So the payload is the package of information that flows along with the Beacon response. So the question is now, is there a viral sequence with this mutation? Yes or no? So we repurpose the payload to include information about the viral sequence from which the data was found: things like geography and where was that virus found in the world. Or what evolutionary strain does this map to. And it gives scientists a better understanding about how the virus is mutating, where it comes from, and how it might be evolving, as it transmits itself around the world.
Angela Page: So is this one data set that has lit a Beacon or is it multiple data sets coming together under one hood?
Marc Fiume: So the Beacon Network is actually a search engine across over 100 beacons that are distributed worldwide and they’re good reasons for those beacons being distributed across the world—sometimes it’s the easiest way to create an implementation. But you know, quite often there are regulatory restrictions to where human data can reside. The COVID-19 Beacon is slightly different in architecture as it stands today in that we’ve centralized the dataset by ingesting viral genome sequence data from multiple data sources, and hosting a single Beacon that can be searched. Later, we may see other viral genome sequence Beacons be lit, but currently there is only one and it’s aggregating sequence information from multiple sources.
Angela Page: How is this initiative helping to accelerate COVID-19 research?
Marc Fiume: There are a number of ways that analyzing this kind of information can inform research and public policy. The first one is in strain identification that is helping to understand which strain of the virus someone has. This can be very helpful with diagnostics and treatment. The second one is in transmission route tracing, that is helping us to understand which strains of the virus are being transmitted from where to where, and this is helpful in understanding the effectiveness of public policy and where we need to enforce more stringent guidelines. Next is mutation rate—-that is we can understand the genetic makeup of the virus as it’s evolving. through local clusters, now that we’re socially isolating and physically restricting how far I can move because of travel bans. Next one is molecular structure, we can learn about the physical conformation of proteins that the virus is creating and whether it’s possible to repurpose drugs for similar- looking proteins produced by viruses like SARS, which have 97% sequence similarity to the novel Coronavirus. And the last one is more prospective, it is in future connectivity to host genome information. So we expect that genomics, clinical, and phenotypic data from human subjects that are affected by the virus will become more readily available. And it’s possible to link these to specific strains and understand what are the underlying genetic risk factors and prognostic indicators for a specific strain.
Angela Page: Do you have insights into how the scientific community is using the COVID-19 Beacon?
Marc Fiume: There are multiple applications that we could imagine for the COVID-19 Beacon. You can use it to track evolutionary strains, transmission routes, mutation rates, potentially co-infections, if we were to add clinical metadata. The Beacon is meant to start the conversation around real time global distributed data sharing and we plan to increase both the power of the COVID-19 Beacon, but also launch a suite of other implementations of GA4GH standards that serve the same or similar kinds of data in different ways. So the Beacon itself right now has been queried a couple thousand times and we’re getting new feature requests on how we could add either more functionality or information to the payload to help scientists do science like, how do how do these mutations affect the protein sequence and the 3D conformation of the proteins in which these these variants lie. Another really interesting use of the Beacon and it’s always amazing to hear ways people are using software that you don’t really anticipate. We heard from one expert in Genome Assembly that is sort of playing Humpty Dumpty with genome sequence data, piecing these virus genomes back together from the raw data in order to come up with a linear sequence. We’ve heard all of these developers tell us that they’re using Beacons to sort of sanity check or debug their algorithms for doing the assembly, so that we can standardize methods for COVID-19 virus genome assembly and potentially other virus genomes in the future. So, we’re finding out, you know, how the community is using the Beacon in addition to use cases. We don’t yet have research results. But you know, I could say that there are ambitious initiatives being started to do virus genome sequencing at scale. it’s going to be very important too for localities and regions and provinces and states and countries to be tracking the local evolution of the virus because you know, there are travel bans and so whatever strain of the virus is affecting our community might be unique to that community. So I think it’s going to be really interesting to understand how we could use data sharing technologies to track the local evolution of the virus and help us design more tailored treatments and diagnostic tools.
Angela Page: How does this project fit into larger overarching themes within GA4GH more broadly?
Marc Fiume: One of the one of the goals of the GA4GH has been to make data flow as pervasively as it does on the internet, constrained by, you know, recognizing that genomics data is highly sensitive and identifiable. And so we need to develop a suite of additional layers, a suite of standards that deal with that complexity, that we don’t have with information flowing across the open web. And so as the GA4GH matures, it’s now in a position where we can set up integrations of multiple standards from across work streams. So one of the initiatives we’re working on is called FASP, or Federated Analysis Systems Project. And it brings together Discovery, DURI, and Cloud Work Stream standards to realize the working integration for distributed discovery, access, and analysis on the cloud, and I think that’s one of the holy grails for our field is how do we collectively learn from the wealth of information that’s being generated around the world, while respecting the constraints around regulatory and privacy? I think if we can do that, we’ll break through a lot of barriers and make discoveries really quickly. So without question, I think COVID-19 Beacon will aim to sit into the FASP mold in that we’re looking at the host genome, for clues about why some people are asymptomatic versus why people develop severe illness and die from COVID-19. And when we go in that direction, it’ll be really important for us to develop systems that prospectively consent individuals to share their data to individuals who have the right research purpose and intent, during a time where the world really needs that access to data as quickly as possible, I think that’s going to really help accelerate science.
Angela Page: This is obviously a global health crisis in the truest sense, and many millions of people are being affected. How do you hope the COVID-19 Beacon will offer benefits to human health and medicine over time?
Marc Fiume: COVID-19 requires urgent attention. But we believe we will beat this. And at some point, we’ll have to return to normal. I think there’s an opportunity here in the time of crisis to mount a response, you know, much like an immune system, so that this doesn’t catch us off guard again. And so I think this is definitely raising the importance of population genome sequencing, and I hope that it inspires, health systems and countries to kind of follow the lead of other countries, like what we see with All of Us and Genomics England and Australian Genomics Health Alliance, to really look at population sequencing as a way to create a data resource that allows us to tap into scientific discoveries in times of need.
Angela Page: Well said. Thank you so much for speaking with us today, Marc.
Marc Fiume: Thank you guys so much.
Thank you for listening to the OmicsXchange—a podcast of the Global Alliance for Genomics and Health. The OmicsXchange podcast is produced by Stephanie Li and Caity Forgey, with music created by Rishi Nag. GA4GH is the international standards org for genomics, aimed at accelerating human health through data sharing. I’m Angela Page and this is the OmicsXchange.