The GA4GH Strategic Roadmap, developed in 2018, presents standards and frameworks planned for development under GA4GH Connect — a 5 year Strategic Plan aimed at aligning with the key needs of the genomic data community. Download PDF
The GA4GH Authentication and Authorization Infrastructure (AAI) Profile is the GA4GH standard technical profile for authenticating the identity of individuals seeking to access data and services offered by the Driver Projects, and for authorizing access in accordance with applicable Driver Project policies. AAI is based on the IETF OAuth 2.0 standard, and the OpenID Connect identity layer based on OAuth 2.0, and incorporates the researcher identity passports and data use ontology developed by the Data Use & Researcher Identity (DURI) work stream.
PRODUCT LEAD: DAVID BERNICK
CONTACT: Melissa.konopko@ga4gh.org
Work streams
Data Security (primary), Clinical & Phenotypic Data Capture, Discovery, Data Use & Researcher Identities, Genomic Knowledge Standard
Driver projects
ELIXIR Beacon
In a future where human genomics and health data is stored in a federated network of public clouds, there is a need to tightly control and monitor which users access this data. At the same time, it is important to enable a smooth process and remove friction and artificial barriers between researchers and insights they can glean from the data. This system allows researchers and other users to establish identity and credentials claims with regards to their professional identity to acquire access across datasets.
PRODUCT LEADS: Craig Voisin, Sarion Bowers
CONTACT: Melissa.konopko@ga4gh.org
Work streams
Discovery, Cloud, Large Scale Genomics
Driver projects
ELIXIR Beacon, EVA/EGA/ENA
The ability access (read+write) data across multiple clouds is a key concern for researchers, especially as large, multi-institution projects leverage cloud resources in multiple environments. This API allowing consumers to access data regardless of the repository in which it is stored or managed, making it easier to do work across projects and environments.
PRODUCT LEADS: David Glazer, Brian O’Connor
CONTACT: Rishi.Nag@ga4gh.org
Work streams
Cloud (primary), Discovery, Data Security, Data Use & Researcher Identities, Large Scale Genomics
Driver projects
Australian Genomics, EVA/EGA/ENA, Genomics England, Human Cell Atlas, TOPMed
By its nature, genomic data can include information of a confidential nature about the health of individuals. It is important that such information is not accidentally disclosed. One part of the defense against such disclosure is to, as much as possible, keep the data in an encrypted format. Crypt4GH is a file format that can be used to store data in an encrypted and authenticated state. The choice of encryption also allows the encrypted data to be read starting from any location, facilitating indexed access to files.
PRODUCT LEADS: Alexander Senf, Rob Davies
CONTACT: Rishi.Nag@ga4gh.org
Work streams
Large Scale Genomics
Driver projects
European Genome-phenome Archive (EGA), Australian Genomics Health Alliance (AGHA)
This specification is a standardised extensible data model and message schema specification for the representation of variants. It builds heavily on the work of the Variant Modeling Consortium and will expand that schema to include support for structural and complex variants.
PRODUCT LEADS: Larry Babb, Reece Hart, Alex Wagner
CONTACT: Melissa.konopko@ga4gh.org
Work streams
Genomic Knowledge Standards
Driver projects
ClinGen, ELIXIR Beacon, Genomics England, Monarch Initiative, Variant Interpretation for Cancer Consortium (VICC)
The Service Info/Registry provides a digital network infrastructure for a proposed “Internet of Genomics”. The registry lists GA4GH services (e.g. Beacons, DRS, etc.) or other registries (e.g., Matchmaker Exchange) that have been registered to it. The Service Info/Registry allows for dynamic registration and on-demand discovery of online GA4GH APIs (data, tools, services) to enable their realtime discovery and use.
PRODUCT LEAD: Miro Cupak
CONTACT: Rishi.Nag@ga4gh.org
Work streams
Discovery, Cloud, Large Scale Genomics
Driver projects
ELIXIR Beacon, EVA/EGA/ENA
The portable exchange of tools and workflows is key to scientific reproducibility. The TRS standard, and implementation in Dockstore.org, is designed to robustly address this need.
PRODUCT LEADS: Denis Yuen, Susheel Varma
CONTACT: Rishi.Nag@ga4gh.org
Work streams
Cloud, Discovery, Data Security
Driver projects
Australian Genomics, ENA/EGA/EVA,Genomics England, Human Cell Atlas, TOPMed
While ontologies and terminologies provide the standard data concept definitions for capturing clinical information, an information model is required to successfully exchange that information between clinical information systems and with related information systems. This standard provides information models with different levels of complexity to enable high level clinical phenotype information as well as deep clinical phenotype information to be exchanged.
PRODUCT LEADS: Peter Robinson, Jules Jacobsen
CONTACT: Lindsay.Smith@ga4gh.org
Work streams
Clinical & Phenotypic Data Capture (primary), Discovery, Genomic Knowledge Standards
Driver projects
Australian Genomics, Monarch Initiative, ELIXIR Beacon, Matchmaker Exchange, Variant Interpretation for Cancer Consortium (VICC), BRCA Challenge, ENA / EVA / EGA, Clinical Genome Resource (ClinGen)
Beacon is a platform for global discovery of genomic variant sharing and discovery. A “Beacon” is defined as a web-accessible service that can be queried for information about a specific allele. A user of a Beacon can pose queries of the form “Have you observed this nucleotide (e.g. C) at this genomic location (e.g. position 32,936,732 on chromosome 13)?” to which the Beacon responds with either “yes” or “no”, plus additional metadata. In this way, a Beacon allows allelic information of interest to be discovered by a remote searcher with no reference to a specific sample or patient of origin, thereby mitigating risks to patient/participant privacy.
PRODUCT LEADS: Jordi Rambla, Tony Brookes
CONTACT: Rishi.Nag@ga4gh.org
Work streams
Discovery, Data Use & Researcher Identities (DURI)
Driver projects
ELIXIR Beacon, ENA / EVA / EGA, Genomics England, Australian Genomics
DUO allows data holders to semantically tag datasets with restrictions about their usage, making them automatically discoverable based on the intended usage. It enables machine readable descriptions of data access requests and data use restrictions to be matched, alleviating the need for manual review when datasets are requested by researchers.
PRODUCT LEADS: Melanie Courtot, Jonathan Lawson
CONTACT: Melissa.Konopko@ga4gh.org
Work streams
Data Use & Researcher Identities (primary), Data Security, Regulatory & Ethics
Driver projects
EVA/EGA/ENA, Australian Genomics, All of Us
A key challenge for human genetics is the ability to share large volumes of genomic data between different locations to enable discovery of new genetic associations or provide supporting evidence to new findings. Today, this is largely achieved by copying and transferring large files between two services. However, this approach by definition requires a file and therefore restricts the development of novel strategies for storing and indexing genomic data. We are proposing to develop a secure standard interface for slicing and streaming sequencing data that decouples the assumption of a file at the remote location. It will build upon the incumbent sequencing file formats and use these as the on-the-wire format.
PRODUCT LEADS: Mike Lin, Thomas Keane
CONTACT: Rishi.Nag@ga4gh.org
Work streams
Large Scale Genomics
Driver projects
Australian Genomics, Canadian Distributed Infrastructure for Genomics (CanDIG), Genomics England, EVA/EGA/ENA, Human Cell Atlas
At its core, genetics is about examining differences in the DNA sequence across individuals or species. This API provides a framework to retrieve ‘reference sequences’ by a unique checksum, allowing users to retrieve such reference sequences without ambiguity from different databases and servers.
PRODUCT LEADS: Andy Yates, Rasko Leinonen
CONTACT: Rishi.Nag@ga4gh.org
Work streams
Large Scale Genomics, Genomic Knowledge Standards
Driver projects
ENA / EVA / EGA, Australian Genomics
The ability to execute the same scientific tools and workflows in a variety of environments without modification is a key concern for researchers. WES provides a standard that allows researchers to do just this. In particular, this standard will enable disparate platforms to accept and run workflows in Common Workflow Language and Workflow Definition Language (CWL/WDL)—and possibly other formats—using a common API.
PRODUCT LEADS: James Eddy, Ruching Munshi, Walt Shands
CONTACT: Rishi.Nag@ga4gh.org
Work streams
Cloud, Discovery, Data Security
Driver projects
Australian Genomics, ENA/EGA/EVA, Genomics England, Human Cell Atlas, TOPMed
SAM, BAM and CRAM are standard formats for genomic data that require continued maintenance and development as our capability to interrogate genomic information changes with new technologies. This team will maintain and evolve the primary these file formats.
PRODUCT LEADS: James Bonfield, Louis Bergelson
CONTACT: Rishi.Nag@ga4gh.org
Work streams
Large Scale Genomics
Driver projects
EVA/EGA/ENA, Human Cell Atlas, Genomics England, Australian Genomics, TOPMed
The Breach Response Protocol is a jointly developed strategy, and supporting processes, through which the GA4GH Driver Project community can collaboratively protect itself and effectively respond to and recover from security breaches. This deliverable will be a flexible Best Practices document which will allow genomic data sharing organizations to (i) monitor for and detect breaches, (ii) ascertain whether a breach involves one or more GA4GH standards, (iii) collaboratively share information regarding breaches that involve GA4GH standards, and (iv) support response to and recovery from breaches.
PRODUCT LEAD: Kate Birch
CONTACT: Melissa.Konopko@ga4gh.org
Work streams
Data Security (primary), Regulatory & Ethics
Driver projects
Australian Genomics, All of Us Project, CanDIG, ClinGen, BRCA Challenge, ELIXIR Beacon, EVA/EGA/ENA, NCI Genomic Data Commons, Genomics England, Human Cell Atlas, TOPMed, ICGC-ARGO, Matchmaker Exchange, Monarch Initiative, VICC
VCF is a standard format to represent genomic variation. It requires maintenance and updates to represent new genomic information in an unambiguous manner. In addition to maintaining and evolving VCF, this team will investigate and research new more scalable formats for storing and exchanging genetic variation.
PRODUCT LEADS: Yossi Farjoun, Daniel Cameron
CONTACT: Rishi.Nag@ga4gh.org
Work streams
Large Scale Genomics (primary), Genomic Knowledge Standards
Driver projects
NCI Genomic Data Commons, ENA/EVA/EGA, VICC, ClinGen Other Partners: Wellcome Trust Sanger Institute, Broad Institute of MIT and Harvard
The multilingual International Participant Values Survey, or “Your DNA, Your Say,” explores how people around the world feel about the collection, use, and sharing of genetic and health data for research such as attitudes about genetic exceptionalism, reasons for sharing or not, and what perceived benefits or harms are involved.
PRODUCT LEADS: Anna Middleton, Richard Milne
CONTACT: rews-coordinator@ga4gh.org
Work streams
Regulatory & Ethics
Driver projects
TBD
Pedigree data is currently represented in heterogeneous formats that frequently result in the use of lowest-common-denominator formats (e.g., PED) or custom JSON formats for data transfer. The need for high quality, unambiguous, computable pedigree and family history information is critical for scaling genomic analysis to larger, complex families.
PRODUCT LEADS: Orion Buske, Grant Wood
CONTACT: lindsay.smith@ga4gh.org
Work streams
Clinical & Phenotypic Data Capture (primary), Discovery, Genomic Knowledge Standards
Driver projects
Australian Genomics, Monarch Initiative, All of Us Research Program, ELIXIR Beacon, Clinical Genome Resource (ClinGen), Matchmaker Exchange, Variant Interpretation for Cancer Consortium (VICC), BRCA Challenge
This document will aim to inform research policy makers and projects about what to consider when deciding whether to tell participants about genomic findings relevant to their health. It will include international ethical, legal, and policy guidance around return of clinically relevant individual findings (e.g., individual research results, incidental findings) and generated by whole genome/exome sequencing to research participants and will consider developments in data sharing practices.
PRODUCT LEADS: Madeleine Murtagh, Robert Green
CONTACT: rews-coordinator@ga4gh.org
Work streams
Regulatory & Ethics, Data Use & Researcher Identities
Driver projects
All of Us Project, Genomics England, Australian Genomics
The GA4GH Search API enables a search engine for genomic and clinical data by providing specification for query language across genomic, phenotypic, and clinical data that can be used to implement, for example, Beacons and Matchmakers, but also other applications (e.g. diagnostics, pharmacogenomics, family analysis).
PRODUCT LEADS: Miro Cupak, Aaron Kemp
CONTACT: Rishi.Nag@ga4gh.org
Work streams
Discovery, Cloud, Large Scale Genomics
Driver projects
ELIXIR Beacon, EVA/EGA/ENA
Every compute environment has a different API for the batch execution of tasks. For example, each of the three major cloud vendors provides this service, but using completely different APIs. By providing a common interface that abstracts over their differences, compute engines can quickly move from one compute system to the next.
Work streams
Cloud, Discovery, Data Security
Driver projects
TBD
This project aims to demonstrate that workflows can be exchanged between Driver Project sites and used reproducibly, using preliminary versions of the GA4GH Cloud APIs (TES, TRS, WES, and DOS).
Work streams
Cloud, Data Security
Driver projects
Australian Genomics, ENA/EGA/EVA,Genomics England, Human Cell Atlas, TOPMed
This common data model will guide the linkage of annotations and structured clinical interpretations to variant data. It will include support for current clinical lab standards (e.g., ACMG/AMP), clinical phenotypes (disease/disorder), clinical relevance and context, and associated metadata.
PRODUCT LEADS: Matt Brush, JAVIER LOPEZ
CONTACT: Melissa.Konopko@ga4gh.org
Work streams
Genomic Knowledge Standards (primary), Clinical & Phenotypic Data Capture, Discovery
Driver projects
ClinGen, Variant Interpretation for Cancer Consortium (VICC), Genomics England, BRCA Challenge, Monarch Initiative