GA4GH 2020-2021 Roadmap

Genomic Knowledge Standards Work Stream

Product Roadmap

Motivation and Mandate

Genomic data analysis and interpretation is at the heart of enabling genomic data to improve human health. Many developed analyses require locating interesting and potentially causative changes in genomic sequence before attempting to categorize, rank, and prioritise potential leads by intersecting patient data with known reference data sets. All analysis methods develop their own solutions to access reference genomic sequence, find and use baseline reference genomic annotation (e.g. genes, variations, regulatory regions, expression), integrate and find equivalence with other resources, model data, and distribute the results of said analysis to downstream consumers—be they human or computational. In addition, the provenance of annotation can be unclear and associated metadata may be unstructured. Results may not be directly comparable between two resources due to ambiguity in data representation, semantics, and provenance.

Existing Standards

VMC (Variation Modeling Collaboration) is a specification, now at version 0.1, for modeling simple variation and was developed by members of the Variation Annotation Task Team (VATT). FHIR (Fast Healthcare Interoperability Resources) is a specification to enable the transfer of healthcare information over standard APIs. In addition a number of GA4GH standards for modeling ontologies, genomic annotation and RNA quantification have been developed as part of the schema/reference/compliance suite of applications.

Proposed Solution

The Genomic Knowledge Standards Work Stream (GKSWS) aims to develop, adopt, and adapt standards-based components to enable the exchange of reference genomic information through common APIs, thereby enabling the downstream analysis of genomic data. It will focus on developing specifications related to genomic sequence, annotation, and associated metadata/provenance.

GKSWS will engage with GA4GH Driver Projects, including analysis tool developers/consumers (VICC, GEL) and reference data providers (ClinGen, Ensembl), to ensure that standards-based solutions to data access and exchange are developed based on real-world use-cases whilst also being applicable to more generalized scenarios. GKS will work closely with other GA4GH Work Streams (Large Scale Genomics, Discovery) in areas of common interest to move standards into production (VMC), and we will partner with external standards development organizations to leverage existing specifications and to ensure GKSWS-developed standards are suitable to healthcare environments (HL7, FHIR).

Planned Deliverables

Variant Annotation

  • Type: Data Model/Ontology
  • Expected V1 Approval Date: Q2 2021
  • Requesting Driver Projects: ClinGen, VICC, Genomics England, BRCA Challenge, Monarch Initiative

The VA Specification (VA Spec) will define extensible data models to support representation of diverse kinds of statements made about genetic variation, and the evidence and provenance supporting these statements. The specification will include information models, message exchange schema, and a formal framework for defining custom extensions to the core model. A more detailed description of these components can be found here.

Approved Deliverables

Variant Representation

  • Type: Data Model/Ontology, Protocol
  • V1 Approval Date: 2019
  • Known Implementations & Deployments: ClinGen, VICC, BRCA Challenge

The VR specification is a standardised extensible model for computational variation representation. The specification includes a data model, message schema, and methods for normalization of variants and computing identifiers. Any updates to concepts within the specification include updates to all of the previously stated aspects.

The VR roadmap will extend our 1.0 release to define and model classes of variation in support of complex and aggregate variation.