Access and adopt open standards for genomic data sharing below. You can also download the full 5-year GA4GH Connect Strategic Plan to see standards in development.
A GA4GH-approved Standard The Data Connect API is a standard for the discovery and search of biomedical data, developed by the GA4GH Discovery Work Stream. The standard provides a mechanism for describing data and its underlying data model, as well as the ability to search the data with the given data model.
A GA4GH-approved Standard The GA4GH Large Scale Genomics Work Stream has worked with UCSC to develop GA4GH BED v1.0—a plain-text file format that describes discrete genomic features by physical start and end positions on a linear chromosome. GA4GH BED v1.0 aims to provide a formal specification that fills in gaps in the existing documentation and creates a concrete set of rules and guidelines.
A GA4GH-approved Standard The Task Execution Service (TES) API v1 defines a standardized schema and API for describing batch execution tasks. A task defines a set of input files, a set of (Docker) containers and commands to run, a set of output files, and some other logging and metadata. The TES API supports greater genomic and health data sharing by providing greater flexibility in bringing computation to data and enabling execution of workflows spanning multiple institutions and infrastructures.
A GA4GH-approved Standard The GA4GH Machine Readable Consent Guidance provides instructions for researchers integrate standard data sharing language into consent forms in a way that is able to be translated to a computable language. Machine readable consent language is able to be attached to datasets and stored in their descriptive data using DUO terms. Researchers can then search for datasets that have been consented to for their research purposes.
A GA4GH-approved Standard The GA4GH Data Use Ontology (DUO) allows users to semantically tag genomic datasets with usage restrictions, allowing them to become automatically discoverable based on a health, clinical, or biomedical researcher’s authorization level or intended use. DUO is based on the OBO Foundry principles and developed using the W3C Web Ontology Language. It is being used in production by the European Genome-phenome Archive (EGA) at EMBL-EBI/CRG as well as the Broad Institute for the Data Use Oversight System (DUOS).
A GA4GH-approved Standard The Data Repository Service (DRS) API, a standard for building data repositories and adapting access tools to work with those repositories, works with other approved APIs from the GA4GH Cloud Work Stream to allow researchers to discover algorithms across different cloud environments and send them to datasets they wish to analyze. The API allows data consumers to access datasets regardless of the repository in which they are stored or managed.
A GA4GH-approved Standard The GA4GH Passport specification aims to support data access policies within current and evolving data access governance systems. This specification defines Passports and Passport Visas as the standard way of communicating the data access authorizations that a user has based on either their role (e.g. researcher), affiliation, or access status.
A GA4GH-approved Standard While ontologies and terminologies provide the standard data concept definitions for capturing clinical information, an information model is required to successfully exchange that information between clinical information systems and with related information systems. Phenopackets provides information models with different levels of complexity to enable high level clinical phenotype information as well as deep clinical phenotype information to be exchanged.
A GA4GH-approved Standard The RNAget API v1 provides a means of retrieving data from several types of RNA experiments including (i) feature-level expression data from RNA-seq type measurements and (ii) coordinate-based signal/intensity data similar to a bigwig representation via a client/server model.
A GA4GH-approved Standard The Service Info API is an endpoint for describing GA4GH service metadata, designed for extension and inclusion in other APIs. Service info is used to describe a single service, while Service Registry is used to describe multiple services.
A GA4GH-approved Standard The Service Registry API provides information about other GA4GH services, primarily for the purpose of organizing services into networks or groups and service discovery across organizational boundaries. It’s a minimalistic, simple, lightweight, read-only API for listing services and their metadata, as described by service-info.
A GA4GH-approved Standard The Tool Registry Service (TRS) is a standard API for exchanging tools and workflows to analyze, read, and manipulate genomic data. The TRS API is one of a series of technical standards from the Cloud Work Stream that together allow genomics researchers to bring algorithms to datasets in disparate cloud environments, rather than moving data around. TRS gives researchers access to far more tools than they can presently use, and allows developers to register their products so that they are visible on a multitude of sites, expanding their audience reach. The API also provides a set of requirements for tool and workflow registries to implement TRS.
A GA4GH-approved Standard Beacon v2.0 vastly expands the functionality of the first version to improve the tool’s utility and places special emphasis on responsible access to clinical genomic data for research. While v1.0 of the protocol indicated the presence or absence of an allele in a genomics dataset, the new version gives researchers more options when searching for genomic variants and adds flexibility to ask more questions about the dataset and attributes about participants. In secure settings, such as encrypted networks, authorised users can link Beacon results to privacy-protected data such as a patient’s electronic health record, and connect this to expert variant annotation. Additionally, researchers may apply for access to a dataset returned in their query results, and Beacon v2.0 will show them contact information and data use restrictions to assist in that process.
A GA4GH-approved Standard By its nature, genomic data can include information of a confidential nature about the health of individuals. It is important that such information is not accidentally disclosed. One part of the defense against such disclosure is to keep the data in an encrypted format as much as possible. Crypt4GH is a file format that can be used to store data in an encrypted and authenticated state. Existing applications can, with minimal modification, read and write data in the encrypted format. The choice of encryption also allows the encrypted data to be read starting from any location, facilitating indexed access to files.
The Family History Tool Inventory is a catalogue of family history tools currently available for documenting family health history information. The Statement of Best Practice highlights current approaches and challenges in enabling family history to guide clinical care to developers of clinically-oriented family history collection systems, including stand alone and EHR-integrated systems. The inventory will be updated periodically and we encourage recommendations of other tools to include. To recommend a tool, please email email@example.com.
A GA4GH-approved Standard htsget is a genomic data retrieval specification that allows users to download read data for subsections of the genome in which they are interested. Currently, users must download the whole set of files in which that data resides, a slow, resource-intense process.
A GA4GH-approved Standard All sequencing-based genomic analysis uses a genomic “reference sequence” — a baseline of knowledge against which variations are observed. There are multiple human reference sequences of increasing accuracy and different organizations refer to the same sequence using different names or reuse names to refer to different reference releases. Reliable, reproducible genomic analysis depends on clear provenance back to reference data. The GA4GH refget API enables access to reference genomic sequences without ambiguity from different databases and servers using a checksum identifier based on the sequence content itself.
Standardized benchmarking methods and tools are essential to robust accuracy assessment of next generation sequencing variant calling. Benchmarking variant calls requires careful attention to definitions of performance metrics, sophisticated comparison approaches, and stratification by variant type and genome context. The germline small variant benchmarking tools address challenges in (1) matching variant calls with different representations, (2) defining standard performance metrics, (3) enabling stratification of performance by variant type and genome context, and (4) developing and describing limitations of high-confidence calls and regions that can be used as “truth”. They have been piloted in the precisionFDA variant calling challenges to identify the best-in-class variant calling methods within high-confidence regions.
A GA4GH-approved Standard The Variation Representation (VR) specification provides a flexible framework of computational models, schemas, and algorithms to precisely and consistently exchange genetic variation data across communities. The specification, which was developed with input from national information resource providers, major public initiatives, and diagnostic testing laboratories, significantly reduces ambiguity in exchanging variation data. In this way, VR aims to improve the reliability and utility of the clinical annotations that are central to personalized medicine. The VR specification consists of five key components that together produce a reliable way of describing and transferring genetic variation data: an extensible terminology and information model, a machine-readable schema, conventions for data normalization, globally unique computed identifiers, and a python implementation.
A GA4GH-approved Standard Portable tools — the ability to execute a single analysis in a variety of environments — allow researchers to work with more data from more sources, and tool builders to support more researchers and more use cases. The Workflow Execution Service (WES) API provides a standard for exactly that. This API lets users run a single workflow (defined using CWL or WDL) on multiple different platforms, clouds, and environments, and be confident that it will work the same way. The API provides methods to request that a workflow be run, pass parameters to that workflow, get information about running workflows, and cancel a running workflow.