GA4GH 2020-2021 Roadmap

Clinical & Phenotypic Data Capture Work Stream

Product Roadmap

Motivation and Mandate

The widespread adoption of Electronic Health Records (EHRs), deep phenotyping methods, and improved patient generated multi-modal data provides an opportunity for information from genomics to be integrated into existing or emerging digital health infrastructure to support patient care. The existing health information infrastructure that needs to work with genomics includes the request for a genomics test, the sharing of the results from the test and the representation of genomics information in clinical and patient information systems.

This Work Stream will support the clinical adoption of genomics through establishing information models and standards to describe clinical phenotypes for use in genomic medicine and research, including the capture and exchange of information between clinical, research, and patient-centered systems.

A number of GA4GH Driver Projects are already developing the information infrastructure (forms, term lists, information models) which they are using to support the capture or sharing of information. This includes the forms which they are using to capture data on patients as they are sequenced as part of clinical demonstration projects. These examples will provide an important starting point for understanding the terminology and information models that are needed to describe a clinical phenotype to support clinical care and research.

Proposed Solution

The potential solution set for this Work Stream will include:

  • Development of standard processes for defining a Reference Set of terms relevant for a particular disease or condition.
  • A standard set of FHIR resources for describing a clinical phenotype.
  • Standardized exchange formats for representing clinical data

Planned Deliverables

Pedigree V1

  • Type: Data Model / Ontology
  • Expected Submission Date: Q3 2021
  • Requesting Driver Projects: Australian Genomics, Monarch Initiative, BRCA Exchange, Genomics England, GEM Japan

The need for high quality, unambiguous, computable pedigree and family information is critical for scaling genomic analysis to larger, complex families. Pedigree data is currently represented in heterogeneous formats that frequently result in the use of lowest-common-denominator formats (e.g., PED) or custom JSON formats for data transfer. The HL7 FHIR Family member history for genetics analysis profile supports pedigrees, but there is a need to add more data elements and rethink the data model to support a broader range of use cases. This will be accomplished by creating a minimum core dataset for family health history and developing the next version of a data model. Both will be evaluated and extended by GA4GH, extensions sent back to HL7 for inclusion into FHIR, and feedback gathered from G2MC.  Standardizing the way systems represent family structure will allow patients to share this information more easily between healthcare systems and help software tools to use this information to improve genome analysis and diagnosis.

Phenopackets V2

  • Type: Data Model / Ontology
  • Expected V2 Submission Date: Q2 2021
  • V1 Approval Date: 2019
  • Known V1 Implementations and Deployments: Cafe Variome, AMED Biobank Network, RDConnect, EMBL-EBI (Biosamples), CanDIG/Epishare Metadata Service, Covidaware (Monarch Initiative/Pryzm Health)

Phenopackets is a uniform, machine-readable schema that enables the exchange of both high-level and deep phenotypic information. A phenopacket file contains a set of required and optional fields to share information about an individual, patient or sample phenotype, such as age of onset and disease severity. Phenopackets will allow phenotypic data to flow between clinics, databases, labs, and patient registries in ways currently only feasible for more quantifiable data, like sequence data, and power phenotype-driven diagnostics and computational analysis.

There are a few anticipated releases: 1) A version update to expand Phenopacket use in oncology and COVID-19 research, including better representation of time and additional fields for treatment course, exposure, and medical actions; 2) A core HL7 FHIR Implementation Guide; and  3) Integration of Variant Representation (VRS) from the GKS Work Stream.