GDPR Brief: storage limitations for genomic and health-related personal data

3 Aug 2020

Under the GDPR, personal data may only be kept if strictly necessary to fulfil the purpose of processing. The special provisions of the GDPR on data retention for research are, however, not a carte blanche to hang on to all research data.

Under the GDPR, personal data may only be kept if strictly necessary to fulfil the purpose of processing (Art. 5(1)(e) and Recital 39). Based on Art. 5(1)(b), data can be processed further beyond the initial purpose for research. In accordance with this provision, Art. 5(1)(e) allows also a longer retention beyond the original purpose if data is used exclusively for scientific research. Such retention requires that organisational and technical safeguards be in place in accordance with Art. 89(1). In principle, data used purely for research purposes may be kept indefinitely. (But this position is not unreservedly shared by all data protection authorities.)

The special provisions of the GDPR on data retention for research are, however, not a carte blanche to hang on to all research data. The following conditions need to be complied with: 

The continued retention must not conflict with the conditions under which the data were obtained. If data are kept beyond the initial purpose, they should only be used for scientific research. Therefore, a robust framework of technical and organisational measures needs to be in place to prevent any other use that may take place either intentionally or accidentally. The utility of data for research should also be evidenced and linked to the retention policy. 

While there is no limitation in the GDPR on the time frame for keeping data for scientific research, the provisions relating to information that must be given to data subjects (Articles 13 and 14; Recital 39) require that criteria must be defined as to how long the data will be kept if no definite period is applied. The Article 29 Working Party’s Guidelines on transparency state that a notice specifying “as long as necessary” is not sufficient. Measurable criteria regarding how long data are useful for research must be established and, where relevant, also for keeping pseudonymisation keys. Potential criteria to be fulfilled are: 

  • Continued utility of data
    A future use must be plausible but not necessarily defined. This includes an auditable documentation of the data with metadata to enable their future use in research. Another factor is that data have not been made obsolete e.g. due to quality shortcomings. 
  • Link to the data subject
    Justifications can include the maintenance of links to previously independent datasets or of linking back to data subjects for contact and vice versa. Where the direct link to the data subject is not necessary, an analysis should give input whether anonymisation will compromise the utility too much for research. 
  • Uniqueness of data
    Retention may be justified where it is impossible to recover the data, where recovery would impair research or require a disproportionate effort. For example, phenotypic data reflect a certain point in the health status of a patient and cannot be recreated. In the case of molecular data, decisions regarding retention should consider a balance between the exhaustible nature of the biosamples from which such data are derived and the quality that may make legacy data obsolete as technologies improve.
    On the other hand, where data are obtained from a central repository, there is no need that they are also kept locally, unless this is needed for the reproducibility of research results. 

Research institutions also need to define a time frame and establish mechanisms to periodically review that the retention criteria are still fulfilled. 

In line with the “data protection by design” principle of Article 25, these considerations on data retention must be made upfront before starting the data collection and documented. This also includes a decision (and communication to the data subject) if data will be deleted or anonymised at the end. Data protection by default implies that a defined retention period should be assumed. Therefore, a data retention policy must specify how it is established that data can be kept for a yet undefined timeframe, definition of the relevant retention criteria throughout the data’s life cycle and the corresponding review mechanisms. Following GDPR Article 13/14, the criteria and the related review procedures need to be communicated to the data subjects including notice that this practice may lead to an indefinite retention. 

Further reading

Relevant GDPR Provisions

  • Recital 39 – principles of data processing
  • Article 5(1)(e) – principles relating to the processing of personal data
  • Article 13, 14 – information to be given to the data subject
  • Article 25 – data protection by design
  • Article 89(1) – safeguards when processing for public interest or scientific research purposes

Regina Becker is an ELSI expert at the ELIXIR-Luxembourg, hosted by the Luxembourg Centre for Systems Biomedicine (LCSB) at the University of Luxembourg. 

See all previous briefs.

Please note that GDPR Briefs neither constitute nor should be relied upon as legal advice. Briefs represent a consensus position among Forum Members regarding the current understanding of the GDPR and its implications for genomic and health-related research. As such, they are no substitute for legal advice from a licensed practitioner in your jurisdiction.

Latest News

Headshots of the Cancer Community Co-Leads
18 Jul 2024
The GA4GH Cancer Community welcomes new Co-Leads Benjamin Haibe-Kains, Zinaida Perova, and Bernie Pope
See more
CGC and GA4GH logos
16 Jul 2024
Connecting GA4GH standards to community practice through unconferences
See more
Logos for the Research Data Alliance (RDA) and GA4GH, which are forming a strategic relationship
11 Jul 2024
GA4GH and the Research Data Alliance (RDA) agree to a Strategic Relationship to advance responsible data sharing
See more