4 October 2019
On Friday, October 27, the GA4GH Data Use and Researcher Identity (DURI) Work Stream hosted the webinar “Automating access to human genomics datasets: the GA4GH Data Use Ontology in action.” More than 100 individuals tuned in to learn about the Data Use Ontology (DUO), a GA4GH standard for automating access to human genomics data. The webinar featured presentations from eight international speakers who have contributed to DUO’s development or implemented it at their local institutions. Another six implementers attended as panelists to answer audience questions following the presentations. Speaker slides and a recording of the webinar are available online.
Melanie Courtot, metadata standard coordinator at EMBL’s European Bioinformatics Institute began the webinar with a high-level overview of the DUO standard, describing the current model for depositing, defining limitations for, and accessing data. This process, while common practice in genomic data sharing, is long and strenuous. Further, it does not scale due to the diversity of language and manual review required to grant or deny access requests.
“A community of people came together to build, deliver, and deploy a standard, focusing on addressing their existing challenges for data access at scale. DUO builds on their expertise and pre-existing efforts in data access and use,” said Courtot, who leads the DUO development team.
Courtot explained the benefits of using an ontology such as DUO. DUO can be browsed via the Ontology Lookup Service, which renders human-readable pages for each term with a stable ID, unique label and an unambiguous definition. “Stability of IDs”, Courtot notes, “provides confidence to DUO users that DUO terms remain available and their meaning does not change over time.” Additionally, the hierarchical tree structure of DUO can be leveraged by automated software to determine access permissions.
Jonathan Lawson, Software Product Manager at the Broad Institute and co-lead of the product team, provided an outline of DUO in action using Broad’s Data Use Oversight System (DUOS) software. Lawson explained how DUOS’ algorithm evaluates whether a data access request is compatible with the imposed restrictions on the data using DUO. Lawson further demonstrated how a researcher can submit a data access request (DAR) via DUOS using either a standard data access agreement or the forthcoming “Library Card”— a unique permission granted to a researcher by their institution’s signing official (ISO) to submit DARs to a Data Access Committee (DAC), eliminating the need for ISOs to approve each DAR individually.
Once the DAR is submitted, the DAC respond by either granting or denying access to the data via the DUOS system, leveraring decision-support from the DUOS algorithm which evaluates the DAR decision alongside the human DAC using DUO.
The DUOS implementation at the Broad Institute has already been successful with a pilot DAC, with the human DAC and DUOS algorithm agreeing on granting or denying all 38 data access requested submitted via DUOS.
Soichi Ogishima, a DUO implementer from ToMMo, Tohoku University, explained how the Japan Agency for Medical Research Development (AMED) Biobank Network is implementing DUO to promote the use of data, as well as of biospecimens stored in biobanks. Researchers using the biobank will encode their research use using DUO, then find and apply to access the appropriate datasets using the AMED Biobank cross-search system.
GEM Japan has benefitted from the standard because DUO provides a framework that simplifies and shortens the data access process for biobank users, particularly in industry research, Ogishima explained.
Tiffany Boughtwood of Australian Genomics gave webinar participants a view of DUO from a plain-language, participant-oriented perspective. In explaining how Australian Genomics has utilized DUO in their participant portal, CTRL (“control”), Boughtwood showed a use for DUO in research prior to data collection.
CTRL is an online platform that research participants use for dynamic, granular choice and consent around the use of their data for future research. The DUO system provides a framework for the patient portal, allowing participants to determine who can be granted access to their data.
“Applying DUO gives us confidence around the future proofing of data access,” said Boughtwood. “We really value the opportunity to contribute to the development and piloting of GA4GH standards because it allows us to make sure the outcomes will fit our research.”
Aina Jene, bioinformatician at the Center for Genomic Regulation, gave an overview of the implementation and use of DUO at the European Genome-phenome Archive (EGA). EGA has started implementing DUO codes within their policy structure, and will eventually be implementing DUO into the submitted portal so users can add these codes themselves.
Jene also provided webinar attendees with high-level instructions for data discovery using DUO codes in the EGA (either searching for a dataset tagged with DUO codes or specific DUO codes themselves).
Hayley Clissold, a policy officer for the Data Access Committee (DAC) at the Wellcome Sanger Institute (WSI), discussed how her team has applied DUO codes to nearly 300 cancer datasets, a third of the datasets in WSI’s data archive. Applying DUO to the remaining datasets, Clissold explained, will be more time-consuming because they have free text usage restrictions, manually entered by research administrators, which have to be carefully reviewed to be sure the correct DUO terms are used.
In the future, WSI plans to train research administrators to use the DUO tags to describe usage restrictions upon submission, so the datasets are properly tagged as soon as they are available for sharing. Using DUO at WSI increases the findability of datasets, thereby promoting their reuse.
Mikael Linden of ELIXIR AAI presented the full data access request process, leveraging both Researcher Passports and DUO for authentication and data access authorization. Researcher Passports, another product in development from the DURI Work Stream, are used to describe an authenticated researcher’s properties and help data holders manage data access.
Linden compared the role of DUO to that of Researcher Passports through three phases: discovery, Data Access Committee oversight (DACO), and data access/use, and highlighted how they play complementary roles in streamlining access to data.
Adrian Thorogood, who manages the GA4GH Regulatory and Ethics Work Stream, spoke about the role of aligning consent language with DUO. Thorogood announced the recently-updated GA4GH Consent Policy and described its role in respecting the autonomous decisions of data subjects and increasing transparency regarding how data is shared.
Thorogood discussed the importance of adopting consent language that maps to DUO, and how this will impact data collection and sharing in the future. Aligning consent language to DUO encourages researchers to prepare and clarify in advance how the participant’s data will be shared and accessed. In turn, such mapping reduces the burden faced by Institutional Review Boards (IRBs) and DACs when trying to interpret consent at the point of data release.
“Actively exploring how to improve the consent process now that DUO is here will really enable the effectiveness of DUO to ensure data are shared maximally while respecting commitments made to participants,” said Thorogood.
Both when drafting consent forms for future and previously collected data, Thorogood stressed the importance of exercising care. Being too restrictive with consent drafting or interpretation restricts legitimate data sharing and slows research, while being too liberal can lead to data leakage that harms participant privacy and trust.
The presentations were followed by a brief session in which presenters and panelists took questions from webinar viewers. Melissa Konopko, who manages the DURI Work Stream, invited viewers to get involved with the development and implementation of DUO and future GA4GH standards either by accessing the DUO Github repository or reaching out to Konopko directly.
Pinar Alper — Data Steward, Luxembourg Center for Systems Biomedicine (LCSB)
Anthony Brookes — Professor of Genomics and Bioinformatics, University of Leicester
Francis Jeanson — Founder, Datadex Inc
Giselle Kerry — EGA Project Coordinator & Senior Helpdesk Officer, EMBL-EBI (EGA)
Kathy Reinold — Principal Data Modeler, Broad Institute of MIT and Harvard
Heidi Sofia — Program Officer, NIH National Human Genome Research Institute (NHGRI)