Genomic Data Toolkit

Access and adopt ready-to-use Genomic Data for genomic data sharing below or download the full 5-year GA4GH Connect Strategic Plan.

Data Use Ontology v1

A GA4GH-approved Standard The GA4GH Data Use Ontology (DUO) allows users to semantically tag genomic datasets with usage restrictions, allowing them to become automatically discoverable based on a health, clinical, or biomedical researcher’s authorization level or intended use. DUO is based on the OBO Foundry principles and developed using the W3C Web Ontology Language. It is being used in production by the European Genome-phenome Archive (EGA) at EMBL-EBI/CRG as well as the Broad Institute for the Data Use Oversight System (DUOS). contributors

Available resources

Beacon API v1

A GA4GH-approved Standard The Beacon API can be implemented as a web-accessible service that users may query for information about a specific allele. A user of a Beacon can pose the query “Have you observed this nucleotide (e.g. C) at this genomic location (e.g. position 32,936,732 on chromosome 13)?” to which the Beacon responds with either “yes” or “no”. The new release of the Beacon API extends its functionality through support for additional types of genomic variants and improved metadata support. Additionally, the accompanying ELIXIR Beacon reference implementation demonstrates ELIXIR Authorization and Authentication Infrastructure (AAI), enabling data owners to light Beacons at different tiers of data access: public, registered, or controlled. Contributors

Available resources

CRAM File Format v3

The CRAM file format is an efficient storage format for read data, achieving significantly better lossless compression than BAM, whilst maintaining full compatibility. contributors

Available resources

Family History Tools Inventory

The Family History Tool Inventory is a catalogue of family history tools currently available for documenting family health history information. The Statement of Best Practice highlights current approaches and challenges in enabling family history to guide clinical care to developers of clinically-oriented family history collection systems, including stand alone and EHR-integrated systems. The inventory will be updated periodically and we encourage recommendations of other tools to include. To recommend a tool, please email Contributors

htsget API v1

A GA4GH-approved Standard htsget is a genomic data retrieval specification that allows users to download read data for subsections of the genome in which they are interested. Currently, users must download the whole set of files in which that data resides, a slow, resource-intense processor. contributors

Available resources

refget API v1

A GA4GH-approved Standard All sequencing-based genomic analysis uses a genomic “reference sequence” — a baseline of knowledge against which variations are observed. There are multiple human reference sequences of increasing accuracy and different organizations refer to the same sequence using different names or reuse names to refer to different reference releases. Reliable, reproducible genomic analysis depends on clear provenance back to reference data. The GA4GH refget API enables access to reference genomic sequences without ambiguity from different databases and servers using a checksum identifier based on the sequence content itself. Contributors

Available resources

SAM/BAM File Formats v1

Specifications for storing next-generation sequencing read data. contributors

Available resources

Variant Benchmarking Tools

Standardized benchmarking methods and tools are essential to robust accuracy assessment of next generation sequencing variant calling. Benchmarking variant calls requires careful attention to definitions of performance metrics, sophisticated comparison approaches, and stratification by variant type and genome context. The germline small variant benchmarking tools address challenges in (1) matching variant calls with different representations, (2) defining standard performance metrics, (3) enabling stratification of performance by variant type and genome context, and (4) developing and describing limitations of high-confidence calls and regions that can be used as “truth”. They have been piloted in the precisionFDA variant calling challenges to identify the best-in-class variant calling methods within high-confidence regions. Contributors

Available resources

VCF v4 / BCF v2 File Formats

The specifications for Variant Call Format Files (VCF) and its binary counterpart BCF. contributors

Available resources

Workflow Execution Service (WES) API v1

A GA4GH-approved Standard Portable tools — the ability to execute a single analysis in a variety of environments — allow researchers to work with more data from more sources, and tool builders to support more researchers and more use cases. The Workflow Execution Service (WES) API provides a standard for exactly that. This API lets users run a single workflow (defined using CWL or WDL) on multiple different platforms, clouds, and environments, and be confident that it will work the same way. The API provides methods to request that a workflow be run, pass parameters to that workflow, get information about running workflows, and cancel a running workflow. contributors

Available resources