GDPR Brief: can genomic data be anonymised?

10 Oct 2018

Anonymisation is the irreversible alteration of data so that its human subjects are no longer identifiable. Though this makes it incompatible with longitudinal follow-up, and is therefore generally discouraged in precision medicine, it can be an attractive option to comply with data protection law. Indeed, the GDPR does not regulate anonymised data at all, and insists on keeping data in an identifiable form for no longer than necessary for the purposes for which it is processed.

Anonymisation is the irreversible alteration of data so that its human subjects are no longer identifiable. Though this makes it incompatible with longitudinal follow-up, and is therefore generally discouraged in precision medicine, it can be an attractive option to comply with data protection law. Indeed, the GDPR does not regulate anonymised data at all, and insists on keeping data in an identifiable form for no longer than necessary for the purposes for which it is processed.

But researchers should never assume that genomic data are anonymous. This may surprise those familiar with US Institutional Review Boards, who regularly view rich genomic datasets as sufficiently de-identified so that their analysis does not qualify as human subjects research regulated by the US Common Rule.

The GDPR links the assessment of identifiability to available technology. This determination cannot ignore that genomic re-identification strategies can now:

Genomic datasets that have been coded allow re-identification, even when they may be considered de-identified according to the HIPAA Privacy Rulecan nonetheless only be considered pseudonymised at best under the GDPR. Recital 26 states that pseudonymised data remain personal data.

Yet it would be going too far to state that genetic or genomic data can never be anonymised. The mere observation, for example, that the prevalence of a BRCA mutation is roughly 0.25% of a national population is both “genetic” and “data”, will generally not fall within the GDPR’s notion of personal (i.e. identifiable) data.

To take a practical example, the International Cancer Genome Consortium determined that although it should largely treat the non-cancerous sequencing data it had collected as personal data, genetic variants specific to tumour cells were nonetheless anonymous, with rare exceptions. It freely distributes the anonymous variants to other researchers in accordance with the principle of open science.

Therefore, whether genomic data can be anonymised for the purposes of the GDPR has to be determined on a case-by-case basis, taking into account:

  • all the means of identification, direct or indirect, reasonably likely to be used by any person, and
  • objective factors, including the costs of and the amount of time required for identification, the available technology at the time of the processing, and technological developments.
Further Reading

Relevant GDPR Provisions

Mark Phillips is a lawyer with a background in computer science, and an academic associate at McGill University. He advises clients on and writes about various data protection issues.

See all previous briefs.

Please note that GDPR Briefs neither constitute nor should be relied upon as legal advice. Briefs represent a consensus position among Forum Members regarding the current understanding of the GDPR and its implications for genomic and health-related research. As such, they are no substitute for legal advice from a licensed practitioner in your jurisdiction.

Latest News

6 Mar 2024
Putting GA4GH standards into practice: Mallory Freeberg and Alastair Thomson to lead GA4GH Implementation Forum
See more
21 Feb 2024
Supporting alignment across GA4GH standards: TASC seeks feedback on new pagination guide
See more
20 Feb 2024
Grand challenges in rare diseases: register for the GA4GH Rare Disease Community of Interest webinar on 29 February 2024
See more