Can Genomic Data Be Anonymised?

Anonymisation is the irreversible alteration of data so that its human subjects are no longer identifiable. Though this makes it incompatible with longitudinal follow-up, and is therefore generally discouraged in precision medicine, it can be an attractive option to comply with data protection law. Indeed, the GDPR does not regulate anonymised data at all, and insists on keeping data in an identifiable form for no longer than necessary for the purposes for which it is processed.

But researchers should never assume that genomic data are anonymous. This may surprise those familiar with US Institutional Review Boards, who regularly view rich genomic datasets as sufficiently de-identified so that their analysis does not qualify as human subjects research regulated by the US Common Rule.

The GDPR links the assessment of identifiability to available technology. This determination cannot ignore that genomic re-identification strategies can now:

Genomic datasets that have been coded allow re-identification, even when they may be considered de-identified according to the HIPAA Privacy Rulecan nonetheless only be considered pseudonymised at best under the GDPR. Recital 26 states that pseudonymised data remain personal data.

Yet it would be going too far to state that genetic or genomic data can never be anonymised. The mere observation, for example, that the prevalence of a BRCA mutation is roughly 0.25% of a national population is both “genetic” and “data”, will generally not fall within the GDPR’s notion of personal (i.e. identifiable) data.

To take a practical example, the International Cancer Genome Consortium determined that although it should largely treat the non-cancerous sequencing data it had collected as personal data, genetic variants specific to tumour cells were nonetheless anonymous, with rare exceptions. It freely distributes the anonymous variants to other researchers in accordance with the principle of open science.

Therefore, whether genomic data can be anonymised for the purposes of the GDPR has to be determined on a case-by-case basis, taking into account:

  • all the means of identification, direct or indirect, reasonably likely to be used by any person, and
  • objective factors, including the costs of and the amount of time required for identification, the available technology at the time of the processing, and technological developments.
Further Reading

Relevant GDPR Provisions


Mark Phillips is a lawyer with a background in computer science, and an academic associate at McGill University. He advises clients on and writes about various data protection issues.


Subscribe to the GA4GH GDPR Briefs.