The Experiments Metadata Checklist is an approved GA4GH product

News

13 Nov 2025

The Experiments Metadata Checklist is an approved GA4GH product

13 Nov 2025

Developed within the Discovery Work Stream, the Experiments Metadata Checklist establishes a minimum checklist of properties to standardise descriptions about how genomics experiments are conducted.

By Jaclyn Estrin, GA4GH Senior Science Writer

The Global Alliance for Genomics and Health (GA4GH) is pleased to announce the recent approval of the Experiments Metadata Checklist (Expmeta) as an official GA4GH product.

Developed within the Discovery Work Stream, the Experiments Metadata Checklist was spearheaded by product leads David Bujold (Canadian Centre for Computational Genomics at McGill University) and Peter Woollard (EMBL’s European Bioinformatics Institute), under the guidance of Discovery Work Stream Manager Beatrice Amos (Wellcome Sanger Institute), to establish a common approach to describing experimental sequencing metadata.

Advancements in genomics sequencing technologies have increased the speed at which tests and diagnostics can be run. This speed, in conjunction with the vast diversity of experimental data derived from different -omics experiments, has resulted in data generation at an unprecedented scale. This data is multifaceted, complex, and varies greatly from one sequencing test to the next.

However, there is no consistency within the metadata — meaning, there has not been a consistent way, across platforms and databases, to describe the data and the experimental conditions under which the data were generated.

It is critical for researchers to understand the nature of a sequencing experiment, so that they can better analyse and comprehend the results. Yet, when researchers download genomic datasets, they are often unable to determine how the genomic experiments were conducted. Without this knowledge, researchers are unable to easily discover data they need to run comparative analyses, and the data cannot be readily reused or shared.

The Experiments Metadata Checklist was developed to describe the most important variables within a sequencing experiment, with a focus on properties that are standard across all -omics disciplines. While not in the scope of this standard, other types of metadata, including project and sample level, are also important considerations.

Within a genomic experiment, biological samples are prepared as collections of DNA fragments, called libraries, which are then processed on a sequencing machine. The output data can then be analysed by researchers. The Experiments Metadata Checklist outlines key properties that clarify the context under which the preparation and sequencing phases of an experiment were conducted. This helps other researchers understand how the data were generated, assess whether datasets are compatible for comparison, and confidently reuse the data in their own analyses.

These key properties include information on the experimental technique applied to test the samples, the platform used to sequence the library, the process used to prepare the library, and the protocols used within each experimental stage. Clarifying the instrument or sequencing procedure used to generate the results may also identify biases within the experiment, which can better inform future research.

The checklist is designed to be incorporated by bioinformaticians, developers, or repository maintainers into data models, tools, and workflows. The product team’s goal is to ensure consistency within metadata, with applicability across laboratories and research environments that manage sequencing data.

Woollard said, “Having an experimental checklist helps ensure a balance between the metadata a sequence data generator thinks is useful, and the metadata that helps many types of data consumers discover, understand, and make use of this valuable genomic information.”

There are a range of reasons implementers might seek to standardise metadata. For instance, large-scale genomics initiatives may want to increase the accessibility and understandability of their data and clinical laboratories need to standardise their metadata for highly regulated environments. The entire health research community would benefit from greater consistency in the way experimental data is captured across -omics ecosystems.

Bujold said, “One of the goals of Expmeta is to facilitate federated data discovery of sequencing experiments.” In doing so, this product bridges data models across genomics consortia, repositories, and laboratories, and advances FAIR principles, to ensure that data are Findable, Accessible, Interoperable, and Re-usable.

Early implementers include the Pan-Canadian Genome Library (PCGL) and the US National Institutes of Health (NIH) National Cancer Institute’s Cancer Research Data Commons (NCI CRDC), both of which are dedicated to capturing the most important variables and data within sequencing experiments. The Experiments Metadata Checklist was built in a modular form, which implementers can adapt to easily integrate it into their projects.

The product team is now turning its focus to engaging with GA4GH Driver Projects to align Expmeta with the experimental data standards of these projects. The team will also host implementer workshops to ensure uptake and adoption.

In discussing what is next for the standard, Bujold said, “Having addressed the core properties of genomics experiments, we’re now moving toward specific sequencing domains such as single-cell and targeted sequencing. Making the checklist into a computable schema is also a key priority.”

Woollard added, “The wide-scale adoption of the Experiment Metadata Standard will enable data to be both more easily discoverable and interoperable. The increased computational accessibility of the data will allow questioning we have barely dreamt of, for example via AI, particularly as we expand beyond the core properties.”

Related Work Streams

Discovery Work Stream