At a concrete level, data from human subjects has two axes of access control:
Researcher Identity: These specify the collection of researchers that may access the dataset at any given time, and the credentials they must supply. For example, it may be the case that only researchers that are members of a consortium may access a dataset for the first year after generation.
Data Use: When human subjects are consented as participants in a study, the informed consent form specifies appropriate restrictions on secondary data use. For example, it may stipulate that the data may be used only for “cancer research in a non-profit setting.” Similarly, data owners may place additional restrictions on data use.
Each of these axes is independent—a researcher may have access to a dataset, but be unable to utilize it because her research purpose is inconsistent with the data use restrictions. Similarly, a researcher’s purpose may be entirely consistent with the data use restrictions but, because she is not a member of the consortium, she may not be able to access it.
The mandate of the Data Use and Researcher Identities (DURI) Work Stream is to create those standards required to facilitate both of these axes of access control.
Important work has been done within GA4GH and beyond along both of these axes, including:
The Library Cards and Bona Fide Researchers efforts to define researchers and their identities.
The eRA Commons, ORCID, and EGA systems of identities
The DURI Work Stream will drive progress in the following two areas:
DUO allows semantic tagging of datasets with restrictions and permissions about their usage, making them automatically discoverable based on the users’ authorization level of users, or their intended data uses.
To maximize ethical data sharing, integration, and re-use while respecting data subjects’ autonomy. Adopt data sharing consent language that unambiguously maps to DUO to facilitate data discovery, facilitate data access request submissions and approvals, This is a cross workstream product in collaboration with the Regulatory and Ethics group.
GA4GH Passport specification aims to support data access policies and procedures within current and evolving data access governance systems. This specification defines Passports and Passport Visas as a standardized method of communicating the data access authorizations that a user has based on either their role (e.g. researcher), affiliation, or access status.
Planned expansion on v1.0 includes additional role values, connections with other GA4GH APIs, and additional guidance as required by Drivers adopting the standard.