Human Exposome Data Standards

Builds standards to help researchers share and combine data on environmental exposures that affect human health

Environmental exposures such as air pollution, diet, physical activity, and social stress can influence the onset of disease. These effects often accumulate slowly over time and can depend on multiple factors such as lifestyle, occupation, and living conditions. Currently, environmental data is captured in different ways by research groups internationally, making it hard to compare or combine this information.

The Human Exposome Data Standards Study Group is working to solve a big problem in global health research: there are not yet consistent, shared ways to describe how the environment affects human health — especially over long periods of time and across different populations.

This Study Group aims to create international standards to help researchers capture and share environmental exposure data more easily and accurately. These standards will help combine genetic and environmental data in population cohorts (e.g. UK Biobank and All of Us Research Program), making it possible to uncover new insights into how one’s surroundings shape health and disease over time.

Jump to...

Benefits

  • Enables harmonised, interoperable exposure data capture across global cohorts for gene-environment research
  • Facilitates virtual cohort creation and AI-driven analyses by standardising complex exposome data

Target users

Researchers, and clinicians

Community resources

Dive deeper into this product!

The GA4GH Study Group on capturing environmental exposure data in human cohorts aims to develop interoperable standards for representing diverse exposure data in population-scale biomedical datasets. The group focuses on both external exposures (e.g. satellite-based pollution indices, meteorological and land-use data, wearables-derived metrics, and social determinants) and internal exposures (e.g. metabolomics, microbiome, diet, and drug intake), drawing on domains such as exposomics and environmental epidemiology.

This effort integrates with the GA4GH ecosystem through the Phenopackets standard by proposing schemas to encode temporal, geospatial, and biological exposure features alongside phenotype and clinical data, and through alignment with the Data Model and Schema Consensus (DaMaSC) guidelines to ensure schema interoperability. By building on existing resources (e.g. ECTO, PhenX, MetaboLights, NEXUS, HBM4EU, and H3Africa), and incorporating controlled vocabularies and ontologies (e.g. ECTO, UKBB codes), the Study Group will facilitate harmonised data capture and virtual cohort discovery for gene-environment (GxE) studies across federated infrastructures.

The group also explores strategies for linking longitudinal exposure data with electronic health records (EHRs) and other cohort metadata to support AI-based exposome modelling and cross-cohort meta-analyses, with a focus on FAIR (Findability, Accessibility, Interoperability, and Reuse) privacy-respecting data sharing in global health contexts.