25 May 2022
Human genomic data hold a wealth of information about biology and disease. Such data can also contain private information about individuals, and tight access controls can present barriers when asking scientific questions. The GA4GH Discovery Work Stream, in collaboration with ELIXIR, has released a new version of its seminal Beacon API for genomic data discovery. The updated protocol vastly expands functionality to improve the tool’s utility and places special emphasis on responsible access to clinical genomic data for research.
“The original Beacon demonstrated the tremendous interest of the genomics community to participate in data sharing efforts and a willingness to contribute to federated data discovery projects,” said Michael Baudis, a professor of bioinformatics at the University of Zurich and co-lead of the Discovery Work Stream. Baudis, along with Jordi Rambla de Argila (Centre for Genomic Regulation) and Anthony Brookes (University of Leicester), also co-leads Beacon development. “Supported by the amazing dynamics of the Beacon concept, we’ve worked on providing a flexible framework for an ‘internet of genomics,’ empowering use cases from simple discovery of relevant but privacy-protecting datasets to use in secure, clinical environments with full data access.”
The idea of international genomic beacons was raised in 2014 at the foundational meeting of GA4GH by Jim Ostell, who went on to become Director of the National Center for Biotechnology Information (NCBI). In Ostell’s words, “people [had] been scanning the universe of human research for signs of willing participants in far-reaching data sharing, but…it [had] remained a dark and quiet place.” Beacons were intended to be easy to light, and then, once shining, to grow in number and functionality. The release of Beacon v2 last month is an important step in realising this vision.
Under the initial leadership of Marc Fiume, CEO of DNAstack and co-lead of the GA4GH Discovery Work Stream, an early Beacon Network service was established to demonstrate interest in federated genomic data querying. Owing primarily to the support of ELIXIR, the European infrastructure for life sciences data, which now shares development oversight of the standard with GA4GH, the network of Beacons has grown into an international ecosystem of easily discoverable, globally distributed genomic data.
The ELIXIR Beacon Project, launched in 2017, provides a simple and accessible way to query human data from multiple datasets around the world without compromising privacy. Nevertheless, it had become clear that additional updates to the API would be needed to make it a transformative solution for clinical genomic research.
“Beacon v2 expands on our initial vision to accelerate genomics research by making data easier to find,” said Fiume. “Beacon v2 empowers genomic data stewards to retain control over their data while making it possible to search, sharing genomic variant data along with important metadata like clinical and phenotypic information that can now help researchers answer more questions in rare and complex disease.”
While v1 of the protocol simply indicated the presence or absence of an allele in a genomics dataset, the new version gives researchers more options when searching for genomic variants and adds flexibility to ask more questions about the dataset and attributes about participants. In secure settings, such as encrypted networks, authorized users can link Beacon results to privacy-protected data such as a patient’s electronic health record, and to connect this to expert variant annotation. In an alternate scenario, researchers may apply for access to a dataset returned in their query results, and Beacon v2 will show them contact information and data use restrictions to assist in that process.
“Beacon v2 was built by gathering the needs of genetic diagnostics experts as well as knowledge databases and their users,” said Jordi Rambla, team lead of the European Genome-phenome Archive in Barcelona at the Centre for Genomic Regulation (CRG), and product lead for the Beacon v2 protocol. “The participation of GA4GH Driver projects has been key in reaching the v2 milestone.” As a participating institution of ELIXIR Spain, CRG has taken a leadership role on development of the open source discovery protocol since 2018, with substantial contributions from members of the Swiss, Finnish, and UK nodes and developers from all over the world.
The strength of the Beacon technology lies in its simple interface, which enables research scientists to rapidly explore their research hypotheses prior to downloading data or requesting secure access from multiple sources. In “lighting” a Beacon, any institution or individual can make a cohort, case-centred, or research dataset available to the world as a web service — allowing researchers and clinicians to discover its contents.
When querying human genomic data without using Beacons, researchers may be limited by the need to request access to individual datasets before knowing if they contain a sequence of interest. The Beacon protocol allows users to rapidly identify the existence of specific sequences across multiple datasets and avoid unnecessary complicated access requests.
“We’re excited to now have the Beacon v2 API standard, which allows for more detailed data sharing and access to evidence when querying for data on genomic variants. This standard will help power a federated network of variant databases that can enable secure and permissioned access to important data to inform the clinical interpretation of genetic variation.”
“Beacon v2 extends the successful Beacon standard, which has allowed users to query an international network of data, enabling more detailed queries.”
“Genomics data generated for research and healthcare can be reused beyond its original purpose. At ELIXIR we are enabling standards and infrastructure to discover data, whilst ensuring personal data privacy.”
How to access a Beacon