Categorical Variation Representation Specification (Cat-VRS)

Provides a data model (and related tools) for categorical variants in genomic knowledge-bases.

When a patient has their DNA sequenced, the clinical test shows specific changes in their genes. But when researchers collect knowledge and make statements about genomic evidence, they typically look at categories of variation.

For example, take the phrase “TP53 R248 mutations.” A study on breast cancer might use that phrase to refer to point mutations at location TP53.* The phrase is therefore categorical: it aggregates all variants leading to changes at amino acid position R248 of protein TP53.

The rise of categorical variants presents two problems. First, to interpret whether an assayed variant leads to disease, you need to manually search and collate knowledge spread across multiple categorical variants — a slow, error-prone process. Second, complex relationships between categorical variants frustrate efforts to curate and maintain the knowledge-bases that are crucial for connecting genes and disease.

The Categorical Variation Representation Specification (Cat-VRS) development group tackles these problems by developing a formal, computable specification for categorical variants.

*Berns et al. 1998.

Jump to...

Benefits

  • Makes genomic knowledge associated with categorical variants computable and efficiently searchable
  • Promotes more efficient and consistent sharing, curation, and availability of genomic knowledge across GA4GH Work Streams and partner organisations
  • Ensures effective use of categorical variant knowledge — both now and as genomic knowledge grows in the future

Target users

Researchers, clinicians, clinical laboratories, data generators, data custodians, developers, and research institutes

A three panel comic describing the challenges of searching for relevant categorical variants and how Cat-VRS can help.
Image summary: A three panel comic describing the challenges of searching for relevant categorical variants and how Cat-VRS can help.

Community resources

Dive deeper into this product!

Categorical variants pose a challenge for data sharing, storage, curation, and search. A categorical variant is a set of properties that define a domain of observed variants sharing those characteristics. Some common categorical variants describe the effects of splicing behaviour in a gene (e.g. “MET exon 14 skipping mutations”), a shared protein consequence (e.g. “mutations causing an EGFR L858R substitution”), or the expression or activity of a gene product (e.g. “loss of PTEN”).

However, several factors complicate this simple picture. First, new categorical variants are continuously created in the course of genomics research. Second, a single assayed variant belongs to many categories of variation simultaneously. For example, NC_000007.13:g.140453136T>A simultaneously matches a nucleotide sequence variant, gene function variant, a BRAF gene variant, and a BRAF V600E variant. Third, categorical variants themselves have complex, often hierarchical relationships with one another. Finally, existing nomenclatures and knowledge-bases often disagree with each other (and internally) about how to assign variant categories.

This group aims to address these challenges through the creation of a formal model for categorical variants and an associated type-logic for parsing them out. This data model is implemented in a computable JSON schema, and accompanied by a reference Python implementation for creating and validating categorical variation objects.


Date

Version

N.A.

Don't see your name? Get in touch:

  • Larry Babb
    Broad Institute of MIT and Harvard
  • Daniel Puthawala
    Nationwide Children’s Hospital
  • Brendan Reardon
    Dana-Farber Cancer Institute
  • Alex Wagner
    Nationwide Children’s Hospital, Variant Interpretation for Cancer Consortium (VICC)

News, events, and more

Catch up with all news and articles associated with Categorical Variation Representation Specification (Cat-VRS).

A colorful strand of DNA set against images of a patient health record, a database, and a magnifying glass.
12 Jun 2025
GA4GH approves two new products: Categorical Variation Representation Specification (Cat-VRS) and Variant Annotation Specification (VA-Spec)
See more
Colorful lego blocks set against a binary code background
27 Mar 2025
Variation Representation Specification (VRS) v2.0 is an approved GA4GH product
See more
6 Jan 2025
GKS Work Stream Open House (January 2025)
See more