How we work

Data Use & Researcher Identities (DURI) vision statement

Read the 5-year vision statement of the work stream or read the full GA4GH Connect Strategic Plan.

Motivation and Mandate

At a concrete level, data from human subjects has two axes of access control:

  • Researcher Identity: These specify the collection of researchers that may access the dataset at any given time, and the credentials they must supply. For example, it may be the case that only researchers that are members of a consortium may access a dataset for the first year after generation.
  • Data Use: When human subjects are consented as participants in a study, the informed consent form specifies appropriate restrictions on secondary data use. For example, it may stipulate that the data may be used only for “cancer research in a non-profit setting.” Similarly, data owners may place additional restrictions on data use.

Each of these axes is independent—a researcher may have access to a dataset, but be unable to utilize it because her research purpose is inconsistent with the data use restrictions. Similarly, a researcher’s purpose may be entirely consistent with the data use restrictions but, because she is not a member of the consortium, she may not be able to access it.
The mandate of the Data Use and Researcher Identities (DURI) Work Stream is to create those standards required to facilitate and automate both of these axes of access control.

Existing Standards

Important work has been done within GA4GH and beyond along both of these axes, including:

  • The Library Cards and Bona Fide Researchers efforts to define researchers and their identities.
  • The eRA Commons, ORCID, and EGA systems of identities

Proposed Solution

The DURI Work Stream will drive progress in the following two areas:

  • Establish researcher identities – The world is in need of i) a consistent definition of who a bona fide researcher is in the physical world, ii) one or more identity providers that respect this definition and provide identities in the virtual world that travel with the researcher across various data sharing repositories.
  • Specify a data use ontology – This ontology will be used to both state the secondary data use restrictions for datasets, as well as researchers’ purposes for wishing to access them. By expressing them in an ontology, it becomes possible to compute whether a given researcher’s purpose is consistent with a given data use restriction.