17 January 2019
The GA4GH Data Use Ontology (DUO) allows users to semantically tag genomic datasets with usage restrictions, allowing them to become automatically discoverable based on a health, clinical, or biomedical researcher’s authorization level or intended use. DUO is based on the OBO Foundry principles and developed using the W3C Web Ontology Language. It is being used in production by the European Genome-phenome Archive (EGA) at EMBL-EBI/CRG as well as the Broad Institute for the Data Use Oversight System (DUOS).
Human subjects datasets often have restrictions such as “only available for cancer use” or “only available for the study of pediatric diseases,” deduced from the original biospecimen collection informed consent form, which must be respected when sharing and studying these datasets. Each institution uses unique language in their informed consent forms to describe secondary use restrictions and conditions. DUO is a standard universal system to categorize these conditions, with an aim to allow data access committees and researchers to interpret the conditions in a consistent, structured way.
DUO represents data use terms from three evolving efforts to standardize data use restrictions in the biomedical and genomics research domain:
DUO is an evolving effort to provide digital ontological representation for all the data use categories defined by the efforts mentioned above. Its evolution is being led by GA4GH Driver Projects such as the EMBL-EBI/CRG EGA where it is currently used in production, the All of Us research program and the NIH Data Commons Pilot.
DUO has been submitted for product approval by the GA4GH Steering Committee as of January 15, 2019 and is open for public comment until February 15, 2019. Technical comments are invited via the GitHub issue tracker, general comments should be sent to the GA4GH Data Use mailing-list.