17 December 2019
The GA4GH Steering Committee has approved the GA4GH Passports and Authentication & Authorization Infrastructure (AAI) specifications produced by the GA4GH Data Use & Researcher Identities (DURI) and Data Security Work Streams, respectively. The two standards work in conjunction to reliably authenticate a researcher’s digital identity and automate their access to a requested genomic dataset.
Currently, Data Access Committees (DACs) at data repositories such as the European Genome-Phenome Archive (EGA) or the NIH Database of Genotypes and Phenotypes (dbGaP) grant access to research data on a project basis, reviewing each research team’s proposed project before granting access to a dataset (See video: “Why DACs?”). DACs often have a difficult time verifying and validating the applicant researcher’s identity such that processing a single request can take months—multiply that by thousands of researchers requesting access and you’re quickly met with an unscalable and unsustainable process. Furthermore, DAC approval is limited to a single project, meaning submission, review, and approval must be repeated each time a researcher wants to access the data.
In order to leverage the vast genomic data and computational resources available today, data access and its ability to scale are crucial. The amount of genomic data generated around the globe is growing exponentially, which will lead to proportional increases in the number of data access requests.
The new standards introduce a level of efficiency by supporting automation of the data access process. AAI lays the foundation for a federated mechanism for authenticating an individual’s identity and authorizing their access to an underlying dataset. Building on the OpenID Connect standard maintained by the OpenID Foundation, GA4GH’s AAI specification introduces the concept of an “access token” that can be passed around the internet and repurposed for subsequent data access requests without additional manual labor.
“Importantly, GA4GH AAI is ‘domain agnostic,’ meaning it can be applied to any type of dataset, genomic or otherwise,” said David Bernick, Chief Security Officer at the Broad Institute of MIT and Harvard, co-lead of the GA4GH Data Security Work Stream, and AAI product lead.
The GA4GH Passports specification then takes AAI and layers on requirements related to access policy. Whereas AAI provides the mechanism to have a user identify themselves by logging in and transporting claims about the user, Passports provide the data format to allow those user claims to be permissions related to datasets, user roles, resources, and more.
“The GA4GH Passport uses the AAI access token to transport a researcher’s digital identity and permissions across organizations, tools, and environments and then maps access to data across these,” said Craig Voisin, a software engineer at Google and co-lead of the RI Subgroup of the DURI Work Stream. “Furthermore, it handles federation: there can be multiple organizations, multiple tools, and multiple environments that can all work together within one pipeline analysis.”
Similar to the country visas familiar to international travelers, the GA4GH Visa can be used again and again to enter a space for some time-limited window—they can last minutes, days, or years and allow researchers to ‘enter’ into a specific digital environment. For example, a researcher can combine data between two or more datasets spanning multiple clouds, provided they have been authorized to access each. Some of these may use different tools to evaluate passports and deliver access permissions.
The GA4GH Passport specification enables this process for both registered- and controlled-access datasets. For registered access, a passport clearinghouse—the entity that vets a user’s passport—must only verify that the individual (a) is a bona fide researcher and (b) has agreed to a set of ethics terms (as previously established by the GA4GH Regulatory & Ethics Work Stream).
On the other hand, access to controlled datasets is limited to approved uses for a specific research project and research team. In this case, the clearinghouse must verify the researcher’s dataset access visas, which provide authorization to use the data.
“We are really moving the needle with the Passports’ bona fide researcher Registered Access status, which is basically an access pass to use many datasets without the need for case-by-case review,” said Stephanie Dyke, ethics & policy researcher at McGill University and co-lead of the RI Subgroup of the DURI Work Stream. “Registered Access status should accelerate access for researchers around the world, and we are currently extending the concept for clinical care professionals too.”
Working together with other DURI Work Stream standards, including the approved GA4GH Data Use Ontology (DUO), the Passports and AAI specifications standardize data access and use restrictions, streamline authorization and authentication processes, and aim to reduce the time DACs must spend to make data access decisions.
“We anticipate that the Passports and AAI specifications will further our progress towards improving data access by helping Data Access Committees and data services automate their processes,” said Tommi Nyrönen, head of ELIXIR Finland and co-lead of the DURI Work Stream. “While DACs will still review data access requests, standardisation of the identity and access token content in Passports will significantly expedite the request process and move us closer towards automation. The standards make it easier to share and access data across the world, which helps advance our research and our collective understanding of human health and disease.”