View in GitHub


8 Nov 2023

Product Documentation

Genomic Knowledge Standards (GKS) Work Stream

Categorical Variation (CatVar)

Categorical Variation (CatVar) Study Group in-progress documentation

Categorical Variation Specification !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

To facilitate search of biomolecular variation, contemporary biomolecular knowledgebases routinely "flatten" variation concepts to a specific context that facilitates computable matching to assayed variation, and typically provide related contexts to help characterize the intended biological concept. For example, the variant "BRAF V600E" at the CIViC_ resource describes a protein change, but is flattened to a representative genomic change (GRCh37 chr7:g.140453136A>T) and contextualized with corresponding transcript (NM_004333.4:c.1799T>A) and protein (NP004324.2:p.Val600Glu) descriptions. The representative change is linked to its ClinGen Allele Registry identifier (CAID; CA123643) to facilitate CAID matching from ClinGen resources.

However, CA123643 is likewise a collection of variation contexts, including many contexts that would typically not be considered equivalent to BRAF V600E: ENST00000497784.1:n.1834T>A, ENST00000647434.1:n.738-3918T>A, and ENST00000642228.1:c.877T>A, for example, are all associated contexts with CA123643 but none result in an altered protein product. Similarly, CA16602531_ can also* serve as a linked representative genomic change (through NC_000007.14:g.140753335_140753336delinsTT), but again this concept contains several contexts describing the role of the variant that are not applicable to the V600E protein variation.

In addition, more complex cases of variation also exist, where the closest approximation of a variation amounts to a simple genomic range. Examples of these types of variation include: BRAF V600 mutations, TP53 truncating mutations, EGFR exon 19 deletions_. The concepts associated with these variation (any protein mutation at a codon, any truncating mutation in a gene, and any in-frame deletion in an exon) are not clearly definable using a variation description framework such as VRS or HGVS.

To address these shortfalls, we introduce the Categorical Variation Specification. Categorical Variation captures the semantics that are missing or implied in genomic knowledge resources, providing a framework for expressing how genomic knowledge may match to assayed variation. Much like the VRS objects used in this specification, Categorical Variation classes are designed to instantiate value objects that are readily usable by genomic knowledge search engines. Also see the :ref:CategoricalVariationDescriptor class for describing Categorical Variation under a consistent paradigm with the :ref:ValueObjectDescriptor class.

.. _CategoricalVariation:

Categorical Variation @@@@@@@@@@@@@@@@@@@@@

.. include:: ../defs/catvars/CategoricalVariation.rst

.. _Canonical:

Canonical Variation ###################

.. include:: ../defs/catvars/CanonicalVariation.rst

.. _Complex:

Complex Variation #################

.. include:: ../defs/catvars/ComplexVariation.rst

.. _CA123643: .. _CA16602531: .. _BRAF V600 mutations: .. _EGFR exon 19 deletions: .. _TP53 truncating mutations: