GA4GH approves two new products: Categorical Variation Representation Specification (Cat-VRS) and Variant Annotation Specification (VA-Spec)

News

12 Jun 2025

GA4GH approves two new products: Categorical Variation Representation Specification (Cat-VRS) and Variant Annotation Specification (VA-Spec)

12 Jun 2025

The Global Alliance for Genomics and Health (GA4GH) announces two recent product approvals: Categorical Variation Representation Specification (Cat-VRS) and Variant Annotation Specification (VA-Spec). Both products were developed by members of the Genomic Knowledge Standards (GKS) Work Stream to foster greater precision in the representation of genomic knowledge.

A colorful strand of DNA set against images of a patient health record, a database, and a magnifying glass.

By Jaclyn Estrin, GA4GH Science Writer

The Global Alliance for Genomics and Health (GA4GH) is pleased to announce two newly approved products: the Categorical Variation Representation Specification (Cat-VRS) and Variant Annotation Specification (VA-Spec).

Cat-VRS and VA-Spec are part of a suite of open-source tools developed by the Genomic Knowledge Standards (GKS) Work Stream to standardise how genomic variants are represented and annotated, ranging from specific variants to categories of variants. Implemented together, each product plays a role in improving representation of genomic knowledge to accelerate patient disease diagnosis and foster advancements in patient care.

Currently, the way genomic variants are represented varies across different knowledge bases and clinical sequencing labs. It is, therefore, challenging for researchers to draw connections explaining how genetic diseases manifest with the presence of specific genomic variants.

Because of these challenges, the effort needed to interpret a patient’s genome is extensive, requiring six to ten hours of human expert analysis to produce a final report. This work is an important step in the journey from patient sample to patient care, where samples are taken for genomic sequencing, scientists analyse and interpret the data to determine the genetic variants relevant to the symptoms, and then doctors receive results to diagnose and recommend care options.

There are many points throughout this process that could result in delays. In past decades, it was the genomic sequencing process that was time and resource intensive. Now, with advancements in genomic sequencing technologies, data is being generated at an astronomically greater scale that far outpaces genomic analysts’ bandwidth to interpret it all. This results in a clinical interpretation bottleneck.

Brendan Reardon (Dana-Farber Cancer Institute; Broad Institute of MIT and Harvard), Cat-VRS Product Co-Lead explained, “The number of data points per patient continues to go up over time.” This is especially true if a patient has undergone whole exome or whole genome sequencing. He added, “These are orders of magnitude more mutations that clinical geneticists need to sort through.”

The Categorical Variation Representation Specification (Cat-VRS) is a tool that takes most of the burden of computation and analysis off of the variant analysts. Reardon said, “The more we can do to make their job easier, the faster they will be able to make a clinically relevant decision for each patient.”

Cat-VRS builds off of the GKS Variation Representation Specification (VRS), which establishes a standardised computational language to integrate and share variant knowledge across institutions. Cat-VRS goes one step further by establishing a standard to enable search, analysis, and sharing across categorical properties of variant data.

In clinical genomics research, doctors often look for shared genetic patterns in patients with similar symptoms to guide treatment decisions. These patterns may appear as the same variant in a very specific genomic location, variants at any genomic location in the same gene, or even broader and more complex categories of variation.

“A general process in science is observing trends and then coming up with general rules to explain those trends,” said Daniel Puthawala (The Abigail Wexner Research Institute at Nationwide Children’s Hospital), Cat-VRS Product Co-Lead. He continued, “In genomics, those rules manifest as these categorical variants.”

Alex Wagner (The Abigail Wexner Research Institute at Nationwide Children’s Hospital), Co-Lead of the GKS Work Stream, further explained, “Three to four percent of patients with non-small-cell lung cancer have variants that cause the MET gene to splice out a specific portion of the gene (exon 14), which is used as a biomarker to inform treatment selection. No one genetic variant is responsible. Instead, the collection of all genetic variants that cause this splicing behavior are relevant — a categorical variant.”

Deciding if a variant from a patient belongs to one of these categorical variants from genomic knowledgebases has historically been a highly manual process. Because of the time intensive nature, researchers are having to triage which patients get this type of genomic testing.

Puthawala said, “This is a huge problem, because genomic analysis, particularly in paediatric medicine, is one of the best tools we have for diagnosis, so we need to be able to scale this. Cat-VRS is a data framework that enables computers to procedurally determine how categories of variants are related. This allows researchers to find knowledge about variants of interest faster, alleviating the bottleneck to speed up genomic interpretation.”

Scaling genomic analysis also requires efficient exchange of the knowledge attached to each variant. While VRS and Cat-VRS represent variants and categories of variants, each of these variants is also annotated — or attached — with knowledge. The Variant Annotation Specification (VA-Spec) provides a model to explicitly express knowledge associated with a variant.

This includes both high level, clinically relevant types of knowledge, as well as foundational biological knowledge. Both knowledge types have embedded evidence and provenance information about where the data originated, what methods were used during the annotation process, and how the data was interpreted.

Matthew Brush (The University of North Carolina at Chapel Hill School of Medicine) led the VA-Spec product development, with support from Larry Babb (Broad Institute of MIT and Harvard) and Alex Wagner.

Brush said, “It can be painfully slow to integrate and harmonise knowledge from different sources. The main motivation [for VA-Spec] was the absence of an established computable standard for exchanging variant knowledge and evidence in a way that supported the diverse needs of the research and clinical community, and the challenges that this poses for very important research and clinical enterprises that rely on this type of knowledge.”

VA-Spec helps facilitate clear and efficient exchange of variant knowledge across clinics, institutions, and research systems. Wagner said that VA-Spec encodes knowledge “in a way that allows for communication across different community frameworks…under one interoperable, unified framework that computers understand.”

Brush added, “Having a model that makes that process easier, more efficient, and less prone to misrepresentation and errors in translation will have huge benefits for the community.” The VA-Spec team is currently working to expand the specification, which is built on a generic foundational model, to support additional types of variant knowledge and evidence.

The development of technical standards such as these specifications requires collaboration with implementation partners. Babb said, “Close collaboration with developers and domain experts behind key genomic resources like gnomAD, MaveDB, ClinVar, and CIViC gives high confidence that our standards and tools are both practical and robust, and that shared semantics will enable more accurate, consistent, and reliable use across community pipelines and systems.”

In today’s genomic data ecosystem, where the rate of available scientific knowledge is growing at an exponential pace, researchers have greater access to health data, far beyond the small fraction of information that was previously available. Shared standards for genomic variant representation and annotation provide a foundation to accelerate discovery and application of distributed knowledge resources, enabling researchers to more quickly diagnose patients and deliver faster, more precise care.

Related Work Streams

Genomic Knowledge Standards (GKS) Work Stream