We are in an era of abundant genomic information fueled by steadily decreasing sequencing and processing costs and service platforms that ease analysis. These critical resources are spread throughout the world and are increasingly challenging to aggregate for a multitude of reasons, including scale, regulatory differences, and data harmonization across information arising from diverse origins. We believe a solution to this challenge is to facilitate the discovery and utilization of these varied data sources and services via standard APIs and context-aware user interfaces. The Discovery Work Stream aims to create a unified data discovery platform to make it easier to find and use data, tools, and infrastructure for genomics and clinical analysis.
Organizations such as the Matchmaker Exchange, the Beacon project, BRCA Exchange, and many others approach fragmented and diverse data sources by locally aggregating, harmonizing, and redistributing processed data through web-based user interfaces and standardized APIs. Unfortunately, each has its own data sharing formats and sharing nuances. These cause difficulties and inefficiencies to the consumer in gaining synergistic value by cross referencing and utilizing these invaluable resources. Further, diverse datasets arising from different sequencing and processing technologies as well as overlapping samples add to interpretation challenges.
The Discovery Work Stream proposes a unified interface that acts as a facade to a varied dynamic collection or registry of data sources and services, forming an interconnected ‘Internet of Genomics Data and Services.’ The network’s data sources and services can be crawled and indexed, exposing a single standardized API endpoint that a unified web interface can aggregate and present in a context-aware, meaningful manner. To achieve this, the Work Stream will design a suite of standards that:
Evolving Beacon from discovering/sharing variant data to (safely) discover and share also entities related to variant and genomic diagnoses.
The GA4GH Search API enables a search engine for genomic and clinical data by providing specification for query language across genomic, phenotypic, and clinical data that can be used to implement, for example, Beacons and Matchmakers, but also other applications (e.g. diagnostics, pharmacogenomics, family analysis).
The Service Info/Registry provides a digital network infrastructure for a proposed “Internet of Genomics”. The registry lists GA4GH services (e.g. Beacons, DRS, etc.) or other registries (e.g., Matchmaker Exchange) that have been registered to it. The Service Info/Registry allows for dynamic registration and on-demand discovery of online GA4GH APIs (data, tools, services) to enable their realtime discovery and use.