Extensions to the GA4GH Beacon API will enable a more powerful community resource

In a letter to the editor of Nature Biotechnology published on 4 March 2019, members of the Global Alliance for Genomics and Health present current and future extensions of the Beacon API — an open-source protocol for making anonymised genomic data discoverable for research and clinical purposes.

The Beacon Project was launched in 2014 to demonstrate that genomic data centers around the world were willing and ready to share information about their data. It provided a technically and conceptually simple approach to learning where data of interest reside and thereby enabled rapid and widespread adoption.

Since its inception, 42 international organizations have used the API to serve data from >100,000 individual, anonymized human samples across 200 datasets — including the NIH dbGaP, European Genome-phenome Archive (EGA), and the European Variation Archive (EVA) — discoverable to the clinical and research communities. The Beacon Network (, a distributed search engine across the world’s public Beacons, has been queried 1.5 million times and has grown into an internationally relevant resource for easing data discoverability.

“Part of the GA4GH mission is to enable a globally federated system for genomic data sharing; the success of the Beacon API validates the importance and feasibility of this goal,” said DNAstack CEO and GA4GH Discovery Work Stream co-lead Marc Fiume. “The initial narrow focus of Beacon allowed us to have an immediate impact on the field while building a technical foundation upon which we can create even more powerful search applications across a global ‘Internet of Genomics’. We’re excited to begin rolling out new features that will make it a powerful resource for the clinical community.”

Recent enhancements include the incorporation of tiered data access, as demonstrated by Beacon implementations within ELIXIR, the European life science research infrastructure for bioinformatics. Together with the ELIXIR Authentication and Authorization Infrastructure, users can query Beacon to discover—and subsequently access—data that they are authorized to interact with. In this schema, data can either be (i) completely open, requiring no credentials, (ii) registered, requiring users to log in using ELIXIR AAI and provide bone fide researcher credentials, or (iii) controlled, requiring users to log in and be uniquely approved to access the data within a particular Beacon.

“With these new developments, Beacon is growing from a mere data discovery resource into a tool that the clinical diagnostics community can use to improve patient outcomes,” said Michael Baudis, co-lead of the Discovery Work Stream with Fiume, co-chair of the ELIXIR Beacon Network, and a professor of Bioinformatics at the University of Zurich.

“Previous versions of the Beacon API only provided a yes or no answer to the question, ‘does this dataset contain X allele at Y genomic position?’,” said TK. “Today, that information can also be served alongside additional metadata, including allele frequencies, pathogenicity scores, and phenotypic informations associated with the queried allele.”

In addition to ELIXIR, the Beacon API has been implemented in a diverse cross section of domains to enable a variety of discovery use cases:

  • The 1000 Genomes: to serve large-scale population sequencing data
  • Poly-Phen: to provide in silico predictions for clinical diagnostics
  • Human Gene Mutation Database (HGMD): to serve data from expertly curated or crowd-sourced databases
  • ClinVar: to serve data from variant curation efforts
  • The International Cancer Genome Consortium (ICGC): to share case-level somatic variant observations from over 60 cancer subtypes
  • BRCA Exchange: to distribute consensus classifications for variants in BRCA1 and BRCA2, cataloged by the ENIGMA consortium and additional resources

Third-party organizations — such as Cafe Variome, DNAstack, Global Gene Corp, Genecloud, and Google Cloud — also allow genetic variation datasets stored in those systems to serve their data via the Beacon API.

“To fully realize the potential of the Beacon API to enable unprecedented discovery and access to genomics and clinical datasets, it must be extended,” said Fiume. “We are actively adding support for even more different data types, attributes, and functionality.”

For example, the Discovery Work Stream is currently working to develop a method to “hand-off” Beacon results to external platforms such as the Matchmaker Exchange — a federated network for rare disease gene discovery — in order to access further information about a queried variant.

Such developments will afford greater richness and utility to the Beacon protocol as it continues to evolve from its initial instance to one of the world’s first demonstrations of a globally federated ecosystem for genomic data discovery.