Now open for comment: GA4GH Data Use Ontology

17 Jan 2019

The GA4GH Data Use Ontology (DUO) allows users to semantically tag genomic datasets with usage restrictions, allowing them to become automatically discoverable based on a health, clinical, or biomedical researcher’s authorization level or intended use.

 

The GA4GH Data Use Ontology (DUO) allows users to semantically tag genomic datasets with usage restrictions, allowing them to become automatically discoverable based on a health, clinical, or biomedical researcher’s authorization level or intended use. DUO is based on the OBO Foundry principles and developed using the W3C Web Ontology Language. It is being used in production by the European Genome-phenome Archive (EGA) at EMBL-EBI/CRG as well as the Broad Institute for the Data Use Oversight System (DUOS).

Human subjects datasets often have restrictions such as “only available for cancer use” or “only available for the study of pediatric diseases,” deduced from the original biospecimen collection informed consent form, which must be respected when sharing and studying these datasets. Each institution uses unique language in their informed consent forms to describe secondary use restrictions and conditions. DUO is a standard universal system to categorize these conditions, with an aim to allow data access committees and researchers to interpret the conditions in a consistent, structured way.

DUO represents data use terms from three evolving efforts to standardize data use restrictions in the biomedical and genomics research domain:

  • NIH database of Genotype and Phenotype (dbGaP) data use categories. dbGaP is one of the largest public repositories of genomics data in the world
  • Consent Codes  – a global effort led by Stephanie OM Dyke (McGill University) and the GA4GH Regulatory and Ethics Work Stream to define ‘codes’ for specific categories of data use restrictions based on the datasets of the main public genome archives (NCBI dbGaP and EMBL-EBI/CRG EGA).
  • The Automated Data Access Matrix (ADA-M) – work led by Anthony Brookes and other GA4GH members of the ADA-M task team to define a matrix of data use categories that can be used to define data use restrictions and research purpose.

DUO is an evolving effort to provide digital ontological representation for all the data use categories defined by the efforts mentioned above. Its evolution is being led by GA4GH Driver Projects such as the EMBL-EBI/CRG EGA where it is currently used in production, the All of Us research program and the NIH Data Commons Pilot.

DUO has been submitted for product approval by the GA4GH Steering Committee as of January 15, 2019 and is open for public comment until February 15, 2019. Technical comments are invited via the GitHub issue tracker, general comments should be sent to the GA4GH Data Use mailing-list.

Related Products

Latest News

Headshots of the Cancer Community Co-Leads
18 Jul 2024
The GA4GH Cancer Community welcomes new Co-Leads Benjamin Haibe-Kains, Zinaida Perova, and Bernie Pope
See more
CGC and GA4GH logos
16 Jul 2024
Connecting GA4GH standards to community practice through unconferences
See more
Logos for the Research Data Alliance (RDA) and GA4GH, which are forming a strategic relationship
11 Jul 2024
GA4GH and the Research Data Alliance (RDA) agree to a Strategic Relationship to advance responsible data sharing
See more