GA4GH 2020-2021 Roadmap

Data Use & Researcher Identities Work Stream

Product Roadmap

Motivation and Mandate

At a concrete level, data from human subjects has two axes of access control:

Researcher Identity: These specify the collection of researchers that may access the dataset at any given time, and the credentials they must supply. For example, it may be the case that only researchers that are members of a consortium may access a dataset for the first year after generation.

Data Use: When human subjects are consented as participants in a study, the informed consent form specifies appropriate restrictions on secondary data use. For example, it may stipulate that the data may be used only for “cancer research in a non-profit setting.” Similarly, data owners may place additional restrictions on data use.

Each of these axes is independent—a researcher may have access to a dataset, but be unable to utilize it because her research purpose is inconsistent with the data use restrictions. Similarly, a researcher’s purpose may be entirely consistent with the data use restrictions but, because she is not a member of the consortium, she may not be able to access it.

The mandate of the Data Use and Researcher Identities (DURI) Work Stream is to create those standards required to facilitate both of these axes of access control.

Existing Standards

Important work has been done within GA4GH and beyond along both of these axes, including:

The Library Cards and Bona Fide Researchers efforts to define researchers and their identities.

The eRA Commons, ORCID, and EGA systems of identities

Proposed Solution

The DURI Work Stream will drive progress in the following two areas:

  • Establish researcher identities – The world is in need of i) a consistent definition of who a bona fide researcher is in the physical world, ii) one or more identity providers that respect this definition and provide identities in the virtual world that travel with the researcher across various data sharing repositories.
  • Specify a data use ontology – This ontology will be used to both state the secondary data use restrictions for datasets, as well as researchers’ purposes for wishing to access them. By expressing them in an ontology, it becomes possible to compute whether a given researcher’s purpose is consistent with a given data use restriction.

Approved Deliverables

Data Use Ontology

  • Type: Data Model / Ontology
  • V1 Approval Date: 2019
  • Known Implementations & Deployments: NIH All of Us Research Project, Broad Institute, National Cancer Institute, TOPMed, European Genome-Phenome Archive 

DUO allows semantic tagging of datasets with restrictions and permissions about their usage, making them automatically discoverable based on the users’ authorization level of users, or their intended data uses.

Machine-readable Consents

  • Type: Guide
  • V1 Submission Date: 2020
  • Known Implementations & Deployments: Australian Genomics, Broad Institute

To maximize ethical data sharing, integration, and re-use while respecting data subjects’ autonomy. Adopt data sharing consent language that unambiguously maps to DUO to facilitate data discovery, facilitate data access request submissions and approvals, This is a cross workstream product in collaboration with the Regulatory and Ethics group.

GA4GH Passports

  • Type: Data Model/Ontology, Protocol
  • V1 Approval Date: 2019
  • Known Implementations & Deployments: ELIXIR, Google Cloud Platform

GA4GH Passport specification aims to support data access policies and procedures within current and evolving data access governance systems. This specification defines Passports and Passport Visas as a standardized method of communicating the data access authorizations that a user has based on either their role (e.g. researcher), affiliation, or access status.

Planned expansion on v1.0 includes additional role values, connections with other GA4GH APIs, and additional guidance as required by Drivers adopting the standard.

  • DURI, Discovery, Cloud X-WS use case implementations leveraging Passport and DUO standards