Strategic Roadmap


The GA4GH Strategic Roadmap, developed in 2018, presents standards and frameworks planned for development under GA4GH Connect — a 5 year Strategic Plan aimed at aligning with the key needs of the genomic data community. Download PDF

Approved Standards

Authentication and Authorization Infrastructure (AAI)

The GA4GH Authentication and Authorization Infrastructure (AAI) Profile is the GA4GH standard technical profile for authenticating the identity of individuals seeking to access data and services offered by the Driver Projects, and for authorizing access in accordance with applicable Driver Project policies. AAI is based on the IETF OAuth 2.0 standard, and the OpenID Connect identity layer based on OAuth 2.0, and incorporates the researcher identity passports and data use ontology developed by the Data Use & Researcher Identity (DURI) work stream.

PRODUCT LEAD: DAVID BERNICK
CONTACT: Melissa.konopko@ga4gh.org

Contributors

Work streams

Data Security (primary), Clinical & Phenotypic Data Capture, Discovery, Data Use & Researcher Identities, Genomic Knowledge Standard

Driver projects

ELIXIR Beacon

GA4GH Passports

In a future where human genomics and health data is stored in a federated network of public clouds, there is a need to tightly control and monitor which users access this data. At the same time, it is important to enable a smooth process and remove friction and artificial barriers between researchers and insights they can glean from the data. This system allows researchers and other users to establish identity and credentials claims with regards to their professional identity to acquire access across datasets.

PRODUCT LEADS: Craig Voisin, Sarion Bowers
CONTACT: Melissa.konopko@ga4gh.org

Contributors

Work streams

Discovery, Cloud, Large Scale Genomics

Driver projects

ELIXIR Beacon, EVA/EGA/ENA

Data Repository Service (DRS) API

The ability access (read+write) data across multiple clouds is a key concern for researchers, especially as large, multi-institution projects leverage cloud resources in multiple environments. This API allowing consumers to access data regardless of the repository in which it is stored or managed, making it easier to do work across projects and environments.

PRODUCT LEADS: David Glazer, Brian O’Connor
CONTACT: Rishi.Nag@ga4gh.org

Contributors

Work streams

Cloud (primary), Discovery, Data Security, Data Use & Researcher Identities, Large Scale Genomics

Driver projects

Australian Genomics, EVA/EGA/ENA, Genomics England, Human Cell Atlas, TOPMed

Crypt4GH

By its nature, genomic data can include information of a confidential nature about the health of individuals. It is important that such information is not accidentally disclosed. One part of the defense against such disclosure is to, as much as possible, keep the data in an encrypted format. Crypt4GH is a file format that can be used to store data in an encrypted and authenticated state. The choice of encryption also allows the encrypted data to be read starting from any location, facilitating indexed access to files.

PRODUCT LEADS: Alexander Senf, Rob Davies
CONTACT: Rishi.Nag@ga4gh.org

Contributors

Work streams

Large Scale Genomics

Driver projects

European Genome-phenome Archive (EGA), Australian Genomics Health Alliance (AGHA)

Variation Representation: Data Model/Specification

This specification is a standardised extensible data model and message schema specification for the representation of variants. It builds heavily on the work of the Variant Modeling Consortium and will expand that schema to include support for structural and complex variants.

PRODUCT LEADS: Larry Babb, Reece Hart, Alex Wagner
CONTACT: Melissa.konopko@ga4gh.org

Contributors

Work streams

Genomic Knowledge Standards

Driver projects

ClinGen, ELIXIR Beacon, Genomics England, Monarch Initiative, Variant Interpretation for Cancer Consortium (VICC)

Service Info/Registry

The Service Info/Registry provides a digital network infrastructure for a proposed “Internet of Genomics”. The registry  lists GA4GH services (e.g. Beacons, DRS, etc.) or other registries (e.g., Matchmaker Exchange) that have been registered to it. The Service Info/Registry allows for dynamic registration and on-demand discovery of online GA4GH APIs (data, tools, services) to enable their realtime discovery and use.

PRODUCT LEAD: Miro Cupak
CONTACT: Rishi.Nag@ga4gh.org

Contributors

Work streams

Discovery, Cloud, Large Scale Genomics

Driver projects

ELIXIR Beacon, EVA/EGA/ENA

Tool Registry Service (TRS)

The portable exchange of tools and workflows is key to scientific reproducibility. The TRS standard, and implementation in Dockstore.org, is designed to robustly address this need.

PRODUCT LEADS: Denis Yuen, Susheel Varma
CONTACT: Rishi.Nag@ga4gh.org

Contributors

Work streams

Cloud, Discovery, Data Security

Driver projects

Australian Genomics, ENA/EGA/EVA,Genomics England, Human Cell Atlas, TOPMed

Phenopackets

While ontologies and terminologies provide the standard data concept definitions for capturing clinical information, an information model is required to successfully exchange that information between clinical information systems and with related information systems. This standard provides information models with different levels of complexity to enable high level clinical phenotype information as well as deep clinical phenotype information to be exchanged.

PRODUCT LEADS: Peter Robinson, Jules Jacobsen
CONTACT: Lindsay.Smith@ga4gh.org

Contributors

Work streams

Clinical & Phenotypic Data Capture (primary), Discovery, Genomic Knowledge Standards

Driver projects

Australian Genomics, Monarch Initiative, ELIXIR Beacon, Matchmaker Exchange, Variant Interpretation for Cancer Consortium (VICC), BRCA Challenge, ENA / EVA / EGA, Clinical Genome Resource (ClinGen)

Beacon

Beacon is a platform for global discovery of genomic variant sharing and discovery. A “Beacon” is defined as a web-accessible service that can be queried for information about a specific allele. A user of a Beacon can pose queries of the form “Have you observed this nucleotide (e.g. C) at this genomic location (e.g. position 32,936,732 on chromosome 13)?” to which the Beacon responds with either “yes” or “no”, plus additional metadata. In this way, a Beacon allows allelic information of interest to be discovered by a remote searcher with no reference to a specific sample or patient of origin, thereby mitigating risks to patient/participant privacy.

PRODUCT LEADS: Jordi Rambla, Tony Brookes
CONTACT: Rishi.Nag@ga4gh.org

Contributors

Work streams

Discovery, Data Use & Researcher Identities (DURI)

Driver projects

ELIXIR Beacon, ENA / EVA / EGA, Genomics England, Australian Genomics

Data Use Ontology

DUO allows data holders to semantically tag datasets with restrictions about their usage, making them automatically discoverable based on the intended usage. It enables machine readable descriptions of data access requests and data use restrictions to be matched, alleviating the need for manual review when datasets are requested by researchers.

PRODUCT LEADS: Melanie Courtot, Jonathan Lawson
CONTACT: Melissa.Konopko@ga4gh.org

Contributors

Work streams

Data Use & Researcher Identities (primary), Data Security, Regulatory & Ethics

Driver projects

EVA/EGA/ENA, Australian Genomics, All of Us

htsget API

A key challenge for human genetics is the ability to share large volumes of genomic data between different locations to enable discovery of new genetic associations or provide supporting evidence to new findings. Today, this is largely achieved by copying and transferring large files between two services. However, this approach by definition requires a file and therefore restricts the development of novel strategies for storing and indexing genomic data. We are proposing to develop a secure standard interface for slicing and streaming sequencing data that decouples the assumption of a file at the remote location. It will build upon the incumbent sequencing file formats and use these as the on-the-wire format.

PRODUCT LEADS: Mike Lin, Thomas Keane
CONTACT: Rishi.Nag@ga4gh.org

Contributors

Work streams

Large Scale Genomics

Driver projects

Australian Genomics, Canadian Distributed Infrastructure for Genomics (CanDIG), Genomics England, EVA/EGA/ENA, Human Cell Atlas

refget API

At its core, genetics is about examining differences in the DNA sequence across individuals or species. This API provides a framework to retrieve ‘reference sequences’ by a unique checksum, allowing users to retrieve such reference sequences without ambiguity from different databases and servers.

PRODUCT LEADS: Andy Yates, Rasko Leinonen
CONTACT: Rishi.Nag@ga4gh.org

Contributors

Work streams

Large Scale Genomics, Genomic Knowledge Standards

Driver projects

ENA / EVA / EGA, Australian Genomics

Workflow Execution Service (WES) API

The ability to execute the same scientific tools and workflows in a variety of environments without modification is a key concern for researchers. WES provides a standard that allows researchers to do just this. In particular, this standard will enable disparate platforms to accept and run workflows in Common Workflow Language and Workflow Definition Language (CWL/WDL)—and possibly other formats—using a common API.

PRODUCT LEADS: James Eddy, Ruching Munshi, Walt Shands
CONTACT: Rishi.Nag@ga4gh.org

Contributors

Work streams

Cloud, Discovery, Data Security

Driver projects

Australian Genomics, ENA/EGA/EVA, Genomics England, Human Cell Atlas, TOPMed

Read File Formats (SAM/BAM/CRAM)

SAM, BAM and CRAM are standard formats for genomic data that require continued maintenance and development as our capability to interrogate genomic information changes with new technologies. This team will maintain and evolve the primary these file formats.

PRODUCT LEADS: James Bonfield, Louis Bergelson
CONTACT: Rishi.Nag@ga4gh.org

Contributors

Work streams

Large Scale Genomics

Driver projects

EVA/EGA/ENA, Human Cell Atlas, Genomics England, Australian Genomics, TOPMed

Planned Roadmap Deliverables

Breach Response Protocol

The Breach Response Protocol is a jointly developed strategy, and supporting processes, through which the GA4GH Driver Project community can collaboratively protect itself and effectively respond to and recover from security breaches. This deliverable will be a flexible Best Practices document which will allow genomic data sharing organizations to (i) monitor for and detect breaches, (ii) ascertain whether a breach involves one or more GA4GH standards, (iii) collaboratively share information regarding breaches that involve GA4GH standards, and (iv) support response to and recovery from breaches.

PRODUCT LEAD: Kate Birch
CONTACT: Melissa.Konopko@ga4gh.org

Contributors

Work streams

Data Security (primary), Regulatory & Ethics

Driver projects

Australian Genomics, All of Us Project, CanDIG, ClinGen, BRCA Challenge, ELIXIR Beacon, EVA/EGA/ENA, NCI Genomic Data Commons, Genomics England, Human Cell Atlas, TOPMed, ICGC-ARGO, Matchmaker Exchange, Monarch Initiative, VICC

Genetic Variation File Formats

VCF is a standard format to represent genomic variation. It requires maintenance and updates to represent new genomic information in an unambiguous manner. In addition to maintaining and evolving VCF, this team will investigate and research new more scalable formats for storing and exchanging genetic variation.

PRODUCT LEADS: Yossi Farjoun, Daniel Cameron
CONTACT: Rishi.Nag@ga4gh.org

Contributors

Work streams

Large Scale Genomics (primary), Genomic Knowledge Standards

Driver projects

NCI Genomic Data Commons, ENA/EVA/EGA, VICC, ClinGen Other Partners: Wellcome Trust Sanger Institute, Broad Institute of MIT and Harvard

International Participant Values Survey

The multilingual International Participant Values Survey, or “Your DNA, Your Say,” explores how people around the world feel about the collection, use, and sharing of genetic and health data for research such as attitudes about genetic exceptionalism, reasons for sharing or not, and what perceived benefits or harms are involved.

PRODUCT LEADS: Anna Middleton, Richard Milne
CONTACT: rews-coordinator@ga4gh.org

Contributors

Work streams

Regulatory & Ethics

Driver projects

TBD

Pedigree Representation

Pedigree data is currently represented in heterogeneous formats that frequently result in the use of lowest-common-denominator formats (e.g., PED) or custom JSON formats for data transfer. The need for high quality, unambiguous, computable pedigree and family history information is critical for scaling genomic analysis to larger, complex families.

PRODUCT LEADS: Orion Buske, Grant Wood
CONTACT: lindsay.smith@ga4gh.org

Contributors

Work streams

Clinical & Phenotypic Data Capture (primary), Discovery, Genomic Knowledge Standards

Driver projects

Australian Genomics, Monarch Initiative, All of Us Research Program, ELIXIR Beacon, Clinical Genome Resource (ClinGen), Matchmaker Exchange, Variant Interpretation for Cancer Consortium (VICC), BRCA Challenge

Return of Results Policy

This document will aim to inform research policy makers and projects about what to consider when deciding whether to tell participants about genomic findings relevant to their health. It will include international ethical, legal, and policy guidance around return of clinically relevant individual findings (e.g., individual research results, incidental findings) and generated by whole genome/exome sequencing to research participants and will consider developments in data sharing practices.

PRODUCT LEADS: Madeleine Murtagh, Robert Green
CONTACT: rews-coordinator@ga4gh.org

Contributors

Work streams

Regulatory & Ethics, Data Use & Researcher Identities

Driver projects

All of Us Project, Genomics England, Australian Genomics

Search

The GA4GH Search API enables a search engine for genomic and clinical data by providing specification for query language across genomic, phenotypic, and clinical data that can be used to implement, for example, Beacons and Matchmakers, but also other applications (e.g. diagnostics, pharmacogenomics, family analysis).

PRODUCT LEADS: Miro Cupak, Aaron Kemp
CONTACT: Rishi.Nag@ga4gh.org

Contributors

Work streams

Discovery, Cloud, Large Scale Genomics

Driver projects

ELIXIR Beacon, EVA/EGA/ENA

Task Execution Service (TES)

Every compute environment has a different API for the batch execution of tasks. For example, each of the three major cloud vendors provides this service, but using completely different APIs. By providing a common interface that abstracts over their differences, compute engines can quickly move from one compute system to the next.

Contributors

Work streams

Cloud, Discovery, Data Security

Driver projects

TBD

Testbed & Interoperability Demonstration

This project aims to demonstrate that workflows can be exchanged between Driver Project sites and used reproducibly, using preliminary versions of the GA4GH Cloud APIs (TES, TRS, WES, and DOS).

Contributors

Work streams

Cloud, Data Security

Driver projects

Australian Genomics, ENA/EGA/EVA,Genomics England, Human Cell Atlas, TOPMed

Variant Annotation: Data Model

This common data model will guide the linkage of annotations and structured clinical interpretations to variant data. It will include support for current clinical lab standards (e.g., ACMG/AMP), clinical phenotypes (disease/disorder), clinical relevance and context, and associated metadata.

PRODUCT LEADS: Matt Brush, JAVIER LOPEZ
CONTACT: Melissa.Konopko@ga4gh.org

Contributors

Work streams

Genomic Knowledge Standards (primary), Clinical & Phenotypic Data Capture, Discovery

Driver projects

ClinGen, Variant Interpretation for Cancer Consortium (VICC), Genomics England, BRCA Challenge, Monarch Initiative