Crypt4GH: A secure method for sharing human genetic data

Image Credit: Stephanie Li, GA4GH

Widespread genomic data sharing between researchers and clinicians is crucial to maximizing the utility of datasets and delivering better patient outcomes. However, sharing information across networks and nations increases the risk of a data breach. Since genomic data are inherently sensitive due to their ability to reveal identifiable patient or participant information, the community needs strong approaches to security. 

Crypt4GH, a new standard file container format from the Global Alliance for Genomics and Health (GA4GH), allows genomic data to remain secure throughout their lifetime, from initial sequencing to sharing with professionals at external organizations.

“Strong encryption both in transit and at rest are critical requirements for Genomics England and other genomics in healthcare initiatives,” said Augusto Rendon, who is leading implementation of Crypt4GH at Genomics England, where he is Director of Bioinformatics.

Currently, researchers securely share genomic and other sensitive data files using industry standard encryption, which keeps sensitive information secure during transfer but does not guarantee proper safeguarding thereafter. To support easy access, a user is likely to store the file on their local hard drive in the decrypted state, rather than repeatedly re-encrypting the sensitive information. However, this could leave the data vulnerable.

“If the receiver’s hard drive were to be hacked or their computer stolen,” explains Alexander Senf, Crypt4GH Product Lead and Scientific Programmer at EMBL’s European Bioinformatics Institute (EMBL-EBI), “the sensitive patient information could fall into the wrong hands.” 

Crypt4GH overcomes this challenge by ensuring sensitive genomic data files remain encrypted throughout their lifetime. The approach uses envelope encryption, a protocol that is relatively new to research and healthcare but is increasingly common in the data security field because it enhances the security of data transfer and storage. 

In this process, the encryption is two-fold: the data itself is encrypted, and so is the mechanism for unlocking it. The recipient must have two components. First, she needs her own private key to verify her identity and, second, she needs a key specific to the file being transferred to access the data therein. 

“Being able to access genomic data without having to transfer and decrypt large files will allow us to speed up the reporting process further,” said GA4GH Large-Scale Genomics Work Stream Lead Oliver Hofmann, who is guiding implementation of Crypt4GH at the Australian Genomics Health Alliance (AGHA) where he co-leads the national data federation and analysis program.

Importantly, Crypt4GH only allows the contents of a data file to be read while it remains encrypted—the file is always “locked” when it is returned to storage. Existing applications for reading and analyzing data, such as SAMTools, readily support Crypt4GH, thereby allowing users to interact with the data while still in an encrypted state.

“We would like to see tools for analyzing data in the Crypt4GH format be developed for other utilities, like GATK, in the near future,” said Senf, noting that developing tools to support Crypt4GH on other platforms will encourage widespread adoption of the standard. 

Ultimately, the Crypt4GH standard aims to allow researchers and clinicians to easily communicate private genomic data and information with reassurance that participant and patient data remain secure from transfer, to analysis, to storage.