GA4GH 2020-2021 Roadmap

Discovery Work Stream

Product Roadmap

Motivation and Mandate

We are in an era of abundant genomic information fueled by steadily decreasing sequencing and processing costs and service platforms that ease analysis. These critical resources are spread throughout the world and are increasingly challenging to aggregate for a multitude of reasons, including scale, regulatory differences, and data harmonization across information arising from diverse origins. We believe a solution to this challenge is to facilitate the discovery and utilization of these varied data sources and services via standard APIs and context-aware user interfaces. The Discovery Work Stream aims to create a unified data discovery platform to make it easier to find and use data, tools, and infrastructure for genomics and clinical analysis.

Existing Standards

Organizations such as the Matchmaker Exchange, the Beacon project, BRCA Exchange, and many others approach fragmented and diverse data sources by locally aggregating, harmonizing, and redistributing processed data through web-based user interfaces and standardized APIs. Unfortunately, each has its own data sharing formats and sharing nuances. These cause difficulties and inefficiencies to the consumer in gaining synergistic value by cross referencing and utilizing these invaluable resources. Further, diverse datasets arising from different sequencing and processing technologies as well as overlapping samples add to interpretation challenges.

Proposed Solution

The Discovery Work Stream proposes a unified interface that acts as a facade to a varied dynamic collection or registry of data sources and services, forming an interconnected ‘Internet of Genomics Data and Services.’ The network’s data sources and services can be crawled and indexed, exposing a single standardized API endpoint that a unified web interface can aggregate and present in a context-aware, meaningful manner. To achieve this, the Work Stream will design a suite of standards that:

  • are easy to implement with a community-maintained reference implementation.
  • reflect the context of the data that it shares.
  • reflect the nuances in data sharing preference.
  • leave room to include information from meta-sites, such as DUOS, to help with usage.

Planned Deliverables

Beacon API v2

  • Type: API
  • V1 Approval Date: 2018
  • V2 Expected Submission Date: Q3 2021
  • Requesting Driver Projects: ELIXIR, ENA/EGA/EVA, VICC, BRCA Exchange, Genomics England, SPHN, AGHA, Autism Speaks, EUCANCan

Evolving Beacon from discovering/sharing variant data to (safely) discover and share also entities related to variant and genomic diagnoses. 

Discovery Search API

  • Type: API
  • V1 Approval Date: 2018
  • V2 Expected Submission Date: Q3 2021
  • Requesting Driver Projects: ELIXIR Beacon, ENA/EGA/EVA

The GA4GH Search API enables a search engine for genomic and clinical data by providing specification for query language across genomic, phenotypic, and clinical data that can be used to implement, for example, Beacons and Matchmakers, but also other applications (e.g. diagnostics, pharmacogenomics, family analysis).

Approved Deliverables

Service Registry/Service Info

  • Type: API
  • V1 Approval Date: 2019
  • Known Implementations & Deployments: ELIXIR, Autism Sharing Initiative

The Service Info/Registry provides a digital network infrastructure for a proposed “Internet of Genomics”. The registry  lists GA4GH services (e.g. Beacons, DRS, etc.) or other registries (e.g., Matchmaker Exchange) that have been registered to it. The Service Info/Registry allows for dynamic registration and on-demand discovery of online GA4GH APIs (data, tools, services) to enable their realtime discovery and use.