New strategies for securing Beacon datasets

April 7th 2017
academic papers, beacon, community, meetings, privacy and security

In the United States in 2015, the number one target for cyber-attackers was health data. While that number is shrinking thanks to efforts by the security and privacy research community, health data are still a very common target of cyber-attacks. “Maybe not for the health information per se, but for the identity information they contain,” said Dixie Baker, co-chair of the GA4GH Security Working Group (SWG) and a partner at Martin, Blanck and Associates. “Genomic data are identity data,” she said. That’s why it’s incredibly important that, as it moves forward in its mission to advance data sharing, GA4GH takes great care to develop solutions that protect participants’ privacy and security.

In a paper recently published in the Journal of the American Medical Information Association, Baker and a team of GA4GH volunteers led by genomics security expert Jean-Pierre Hubaux put forth three new strategies for mitigating the risks of cyber attacks on Beacon datasets.

Beacon is a GA4GH demonstration project which offers a lightweight approach to determining whether a dataset contains a particular allele of interest. While the approach inherently limits the risk of sensitive information being leaked, there are still theoretical opportunities for an attacker to verify that an individual's data are included in a Beacon and thereby infer characteristics associated with that dataset. Such attacks depend on the adversary already having access to genomic information about an individual or a close relative.

“Attacks like these are unlikely in practice,” said Paul Flicek, SWG co-chair and Head of Genes, Genomes and Variation Resources at European Bioinformatics Institute. “If you have someone’s whole genome, it is unlikely that your first or even your fiftieth point of call would be a Beacon, but that doesn’t mean that it’s impossible. And therefore we should be realistic about understanding what the risks are and where the limits of the risks are.”

The work is the result of an international collaboration among volunteer security experts around the globe who have dedicated their time to the GA4GH SWG. The SWG is currently expanding and actively recruiting new volunteer members.

Each mitigation strategy tackles the problem from a different angle, and together they provide Beacon owners a collection of solutions to choose from when determining how best to secure their particular datasets, said Hubaux, a professor of computer science at École Polytechnique Fédérale de Lausanne.

For instance, using one strategy, the Beacon would respond with a “yes” to the question of whether an allele is contained in a Beacon only if the allele were present in two or more genomes in the dataset. This approach would work well for many Beacons, as it introduces noise into the original data and significantly decreases the power of the attack. Another approach uses a privacy budget that tracks the information disclosed about each individual in the data set.

“A Beacon for rare variants, for example, might rely on the privacy budget strategy. You would not want to limit discoverability by requiring multiple matches,” said Knox Carey, a member of the SWG and Vice President of Healthcare Initiatives at Intertrust Technologies Corporation and General Manager of it’s Genecloud project. “On the other hand, requiring multiple matches is a  very simple approach that is suitable for many other data sets. Our paper proposes different models for different circumstances.”

Genomic and health-related data hold great potential to advance human health. But in order to make good on that promise, researchers need access to datasets that are larger than any single institution can collect on its own. “In the GA4GH, our mission is to make data sharing easier," Carey said. "Our hope is that risk mitigation efforts like this will encourage even more data sharing and discovery.”

As the GA4GH expands its offerings for genomic data sharing, including evolving the Beacon project, guidance from the genomic privacy and security research community will be critical. "It is known that personalized health raises a number of security and privacy concerns," said Hubaux. "This paper shows that it is possible to devise reasonable countermeasures, with modest overhead. It's of particular relevance because it brings together contributions from several teams around the globe in a field that demands rigorous collaboration and coordination."

 Researchers are invited to get involved with the GA4GH Security Working Group and Beacon Project by contacting the SWG coordinator today. In addition, the 4th International Workshop on ​Genome Privacy and Security (GenoPri'17) will be held in Orlando, Florida, USA in October 2017 and researchers with expertise in this field are strongly encouraged to attend.

1Morgan, Steve. “Top 5 Industries At Risk Of Cyber-Attacks.” Forbes. 13 May 2016.

2 IBM Security. “IBM X-Force Threat Intelligence Index 2017.” IBM. March 2017.

Read the paper >>