4 October 2018
BASEL (October 4, 2018) — The Global Alliance for Genomics and Health (GA4GH) today announced the release of three new interoperability standards from the GA4GH Connect strategic roadmap: Beacon API V1.0.0, refget API V1.0.0, and Workflow Execution Service (WES) API V1.0.0.
Part of a larger suite of standards that together aim to create a federated network for responsible, international genomic data sharing, the new standards address issues of data discovery, reference sequence harmonisation, and cloud computing. They are designed to work both independently or in concert, as part of an end-to-end solution for responsible genomic data sharing.
“Together, these schemas and protocols overcome the most pressing technical challenges of sharing genomic and health related data that will help all of us in the genomics community as we work to advance precision medicine,” said Ewan Birney Chair of GA4GH and Director of the European Bioinformatics Institute (EMBL-EBI).
GA4GH CEO Peter Goodhand said, “Active contributors from a diverse cross section of the genomics community have been working hard over the past several months to bring these new deliverables to life. They represent the first major roll out of our multi-year strategic plan, which we believe will advance human health and medicine by enabling real-world genomic data sharing in the foreseeable future.”
About the New Deliverables
Beacon allows institutions to serve their data as a web-accessible service that users may query for information about a specific allele. Currently, researchers or clinicians interested a particular allele must go through lengthy access protocols before even learning whether a dataset contains the allele of interest. Among other benefits, the Beacon API reduces this resource requirement by providing an immediate yes/no answer to simple queries.
Beacon, which grew out of a GA4GH project to demonstrate the willingness of the genomics community to share data, has been implemented across ELIXIR, the European infrastructure for life sciences data, as well as dozens of additional institutions around the globe.
“The new release of the Beacon API builds on existing work by adding support for additional types of genomic variants, making it an even more powerful tool for molecular geneticists around the globe to use in their variant classification efforts,” said said Marc Fiume, Co-Founder and CEO at DNAstack and co-lead of the GA4GH Discovery Work Stream, which, together with ELIXIR, maintains the Beacon API. “By demonstrating the ELIXIR Authorization and Authentication Infrastructure (AAI) with the reference implementation, we have also enabled additional risk mitigation strategies to ensure the data served by Beacons is as secure as possible.”
Refget harmonizes the way reference genome sequences are named, making it easier to ensure that analyses are reproducible regardless of the reference sequence used. While many institutions use different names to refer to the same reference, or the same name to refer to different references, the refget API assigns a unique digital identifier to the sequences themselves, rather than the names used to store them. It also provides a standard API to access sequences, sub-sequences, and their metadata.
Refget underlies the increasingly popular CRAM file format for storing genomic read data, which calls the reference sequence at EMBL-EBI each time it is used (rather than compressing the reference data into the file alongside the reads as do more traditional formats). Refget makes it possible for users to access “mirrors” of the EBI API with deployments within popular compute clouds due to arrive over the coming months.
“Reference sequences are fundamental to how genomic analysis is performed as they provide a baseline of knowledge of the human genome. Without a clear unambiguous link back to that baseline it can be challenging to compare, aggregate and share knowledge between researchers and clinical settings,” said Andrew Yates, Team Leader, Genomics Technology Infrastructure at EMBL-EBI and co-lead of the refget subgroup of the GA4GH Large Scale Genomics Work Stream. “I am confident that in time refget will be so fundamental to how reference sequences are referred to and accessed we will wonder how we managed without it.”
WES allows users to execute the same scientific tools and workflows in a variety of clouds, platforms, and environments without modification. In particular, WES enables users to submit workflow requests to workflow execution systems, and to monitor their execution.
“In order for researchers to analyze the world’s collective genomic information, they need to be able to execute the same scientific workflows in a variety of environments without modifying them each time around,” said Brian O’Connor, Technical Director of the UCSC Computational Genomics Platform and co-lead of the Cloud Work Stream, which maintains the WES API. “WES provides a mechanism to do exactly that, ensuring compatible results when running genomic alignment, variant calling, variant interpretation, and more.”
The Global Alliance for Genomics and Health (GA4GH) is an international, nonprofit alliance formed in 2013 to accelerate the potential of research and medicine to advance human health. Bringing together 500+ leading organizations working in healthcare, research, patient advocacy, life science, and information technology, the GA4GH community is working together to create frameworks and standards to enable the responsible, voluntary, and secure sharing of genomic and health-related data. All of GA4GH builds upon the Framework for Responsible Sharing of Genomic and Health-Related Data.