An international consortium federating large volumes of sensitive clinical and genomic data across virtual computing environments presents formidable challenges in assuring data confidentiality, data integrity, service availability, and individual privacy. The fact that healthcare data are a leading target for cyber-security attackers exacerbates these challenges.
GA4GH and its partners must implement defense in depth to protect the high-value data we rely upon to accelerate the acquisition and application of biomedical knowledge. A key mandate of the Data Security Work Stream is to help assure that the standards produced by the Technical Work Streams have been developed within a sound risk-management framework.
Some of the security challenges GA4GH faces call for innovative application of well-established security standards and protocols, such as identity federation on a global scale, using OpenID Connect; distributed authorization using OAuth 2.0; transmission protection using Transport Layer Security (TLS), and data encryption using symmetric encryption algorithms such as Advanced Encryption Algorithm (AES). Other challenges require solutions still emerging from security research, such as privacy-preserving data linkage, homomorphic encryption, and quantum key distribution.
Risk management is central to the Data Security Work Stream’s standards-development process, which seeks to leverage industry standards and best practices wherever possible, including GA4GH-specific profiles of existing standards.
To enable GA4GH and its partners to effectively prevent and respond to breach attacks requires a layered and proactive scheme to identify potential threats and vulnerabilities, continuously monitor the use of data and services, detect potential attacks, and collectively respond to potential breaches. The Data Security Work Stream will work with the Driver Projects to broadly apply breach-response methods currently in use to collaboratively protect collective data assets.
The remit of the DSWS includes, but is not limited to, identity management, access authorization and control, privacy-preserving computation, non-repudiation, accountability, service continuity, and breach detection and response. High-priority needs include:
In the community of genomics, many groups lack training in security assessments and the followup of security best practices. This deliverable will be multiple parts:
We also aim to establish a group that would do assessments and provide for a stream of open source tooling to increase the automation of these assessments. This would be a group that would also train other groups and encourage using self-service tools.
The Cloud has gained the attention of many GA4GH projects, and it is becoming increasingly used for large-scale distributed computing services. The Cloud Work Stream emerged to focus on API standards to make it easier to send the algorithms to the data in such environments, and run full workflows on the cloud.
The use of Cloud services (and outsourced services in general), poses multiple legal, ethical and technological challenges in terms of data transfer and processing, for which GA4GH should develop appropriate specific guidelines and recommendations for a secure and privacy-conscious use of Cloud services. An example issue to be addressed is the recommended policy for cryptographic key management.
The GA4GH Authentication and Authorization Infrastructure (AAI) Profile is a technical profile for managing and authenticating the identity of users, and for authorizing access requests for data and services offered through the Driver Projects. The GA4GH AAI Profile is based on the IETF OAuth 2.0 standard, and the OpenID Connect identity layer based on OAuth 2.0, and incorporates the researcher identity vocabulary and data-use ontology developed by the Data Use and Researcher Identity (DURI) work stream.