Diversity in Datasets

Shares actionable recommendations to promote global diversity in datasets within genomic research

For all people to truly benefit from scientific advancement and the full potential of genomics, diverse datasets must be used for research and clinical care. But at all stages of genomic research, there is a critical lack of dataset diversity. The GA4GH policy tool provides research teams with a structured, actionable framework to define and pursue diversity in human genomics and genetic studies by emphasising that diversity is a means, not an end. It asks teams to clarify what types of diversity are relevant (i.e. ancestry, geography, socio-economic status, health status, gender, environment) and why they matter for their specific research goals. Then, it guides them on how to embed those diversity considerations across the full data lifecycle – from study design, participant recruitment, data collection, analysis, sharing, and interpretation. The framework also highlights the need to engage with communities, address barriers to inclusion, avoid conflating categories (e.g. genetic ancestry versus ethnicity), and ensure that diversity strategies align with normative values such as justice, beneficence, and respect for persons. By doing so, the tool shifts genomics studies from simply striving for representativeness toward making deliberate, context-sensitive design choices that maximise the scientific and equity-oriented value of diversity.

There is well known bias in current genomic datasets with much of the data being from individuals of caucasian ancestry. In the current ecosystem, there is a critical lack of dataset diversity across the pipeline of genomic research. In order to optimise the benefits from advancements in genomic research, research infrastructures must strive to become equitable and inclusive by considering more diverse data.  Developed by the GA4GH Regulatory & Ethics Work Stream (REWS),  the Diversity in Datasets policy explores types of diversity  and shares actionable recommendations for researchers in order to uphold diversity in their research and findings.

Jump to...

Benefits

  • Shares guidance on how to best promote diverse datasets in associated research
  • Promotes an international lens on meaningful diversity in datasets
  • Encourages researchers to foster more inclusive, robust research practices for enabling the benefits of genomic advances to be applied more equitably

Target users

Researchers, data generators, data custodians, data access committees, ethics review committees, data protection authorities, funding agencies, security officers, and research institutes

Community resources

Dive deeper into this product!


Don't see your name? Get in touch:

  • Mutiat Afolabi
    Wellcome Sanger Institute (WSI)
  • Shu Hui Chen
    NIH National Heart, Lung, and Blood Institute (NHLBI)
  • Megan Doerr
    Sage Bionetworks
  • Tina Hernandez-Boussard
    Stanford University
  • Jacob Shujui Hsu
    National Taiwan University
  • Sumit Jamuar
    Global Gene Corp
  • Saumya Jamuar
    KK Women's and Children's Hospital
  • Beatrice Kaiser
    McGill University / Université McGill, Centre of Genomics and Policy
  • Anna Lewis
    Harvard University
  • Zane Lombard
    University of the Witwatersrand, National Health Laboratory Service
  • Maxine Mackintosh
    Genomics England
  • Maili Raven-Adams
    The Nuffield Council on Bioethics
  • Alham Saadat
    Broad Institute of MIT and Harvard
  • Sikha Singh
    Association of Public Health Laboratories
  • Diya Uberoi
    McGill University / Université McGill, Centre of Genomics and Policy
loading...

News, events, and more

Catch up with all news and articles associated with Diversity in Datasets.

News
12 Nov 2024
What do we mean by “more diverse” data?: GA4GH’s new product encourages a holistic approach to diversity in datasets
See more
A DNA strand extending across a blue background, filled with molecular structures and more DNA
News
28 May 2024
GA4GH submits comments on the WHO’s draft principles for human genome access, use, and sharing
See more