News

NIH All of Us Program to Implement GA4GH Data Sharing Standards


May 11th 2018

Photo courtesy of the National Institutes of Health.

This week, the All of Us Research Program — a GA4GH Driver Project that seeks to gather sequencing and other data from one million or more people living in the United States — launched a nationwide enrollment effort. Prior to this, the program was operating in select academic medical centers around the country; now any individual in the US can enroll online to contribute to the historic effort.

In contrast to many other national initiatives ongoing around the globe, All of Us has no specific disease focus. Instead, it seeks to create an hypothesis-free dataset that “reflects the rich diversity of America, including volunteers of many races and ethnicities, age groups, geographic regions, gender identities, sexual orientations, and health statuses,” according to the project’s website.

As a GA4GH Driver Project, All of Us has committed to implementing several of the standards being developed by the GA4GH Work Streams. In particular, the project is contributing to the work of the Cloud Work Stream and the Data Use and Researcher Identities (DURI) Work Stream.

“I expect we'll be ramping up our GA4GH interaction as more data, including genomic, starts to flow,” said David Glazer, one of two Driver Project Champions for the project, co-lead of the GA4GH Cloud Work Stream, and Engineering Director at Verily Life Sciences. “Right now the program is in data-gathering mode; the first broad researcher access is planned for about a year from now.” Glazer leads Verily’s contribution to the All of Us Data Research Center.

Currently, All of Us is collecting demographic information from participants. A subset of these individuals will be invited to share biosamples, such as blood, urine, or saliva. These samples will be contained in a biobank being built by the Mayo Clinic in Rochester, Minnesota. It is the largest biobank ever attempted and will require a unique approach to both the physical infrastructure and the technologies used for sample acquisition and tracking.

To manage the data coming from the biobank and other sources, the NIH has funded a Data and Research Center (DRC). The  DRC is located primarily at Vanderbilt University with subawards to Verily and the Broad Institute of MIT and Harvard, for data storage and researcher access, respectively.

“The DRC’s primary remit is to hold all of the data coming from a variety of sources — including sequencing centers, electronic health records, participant surveys, and clinical assessments — and make it available in a way that will lead to insights into the many facets of disease, while appropriately protecting participants’ data and resources,” said David Siedzik, Associate Director of Portfolio and Engagement Strategy at the Broad Data Sciences Platform and a member of the  DRC team. “One major goal for us is to make data as broadly available as possible, including to diverse communities such as disease advocates, biomedical researchers, citizen scientists, and more.”

The DRC aims to use a shared global approach for verifying researcher identity, said Anthony Philippakis, the other All of Us Driver Project Champion, co-lead of the DURI Work Stream, and Chief Data Officer at the Broad. Philippakis has led the development of the All of Us “research workbench,” a web-based platform that will allow researchers to log in and access data. “The access regime will be driven by policy,” said Philippakis, “and that’s primarily where GA4GH has a direct impact.”

The DURI Researcher Identity task team is developing a verification system that uses a “data passport” model. Like a national passport, the data passport allows an already verified individual to enter restricted borders. In this case, that border is around a dataset containing sensitive information that is only consented to be shared for specific research purposes.

“The first sequencing awards will be announced later this year,” Glazer said. “This will bring in more GA4GH standards, including of course CRAM and VCF.” These specifications for storing read and variant call data, respectively, were developed as part of the 1,000 Genomes Project and are now being maintained by the GA4GH Large Scale Genomics Work Stream.

All of Us is also looking at using the library of standards being developed by the Cloud Work Stream to enable researchers to share tools, as well as the Cloud workflow standards to let researchers run batch analyses, Glazer said.

“With the launch of nationwide enrollment, the All of Us Research Program will begin building the largest dataset of its kind,” said Peter Goodhand, GA4GH Chief Executive Officer. “We are very pleased that the project is working with GA4GH to develop and adopt standards to enable interoperability.”

All of Us, the All of Us logo, Precision Medicine Initiative, PMI and The Future of Health Begins with You are service marks of the U.S. Department of Health and Human Services (HHS).