refget

Employs a computer algorithm to unambiguously identify reference sequences for genomic analysis

The first step of any genomic analysis is mapping the new sequence data to a reference sequence — a list of three billion base pairs that have been generally accepted as “normal” for a given population or subgroup. However, standard conventions for naming and identifying reference sequences are lacking. Different organisations refer to the same sequence differently. Developed by the Large Scale Genomics (LSG) Work Stream, the refget API provides a framework to retrieve reference sequences by using an algorithm to derive a unique identifier. The identifier can then be used to verify the integrity of the reference sequence.

All sequencing-based genomic analysis uses a genomic reference sequence — a baseline of knowledge against which variations are observed. There are multiple human reference sequences of increasing accuracy. For example, two organisations may refer to the same sequence using different names, or reuse names to refer to different reference releases. Reliable, reproducible genomic analysis depends on clear provenance back to reference data. refget enables access to reference genomic sequences without ambiguity from different databases and servers using a checksum identifier based on the sequence content itself.

Jump to...

Benefits

  • Reliably access reference sequences for genomic studies
  • Create unambiguous identifiers for reference sequences

Target users

Researchers

Image summary: The refget API helps researchers derive reference genomic sequences with precision.
THEME
CATEGORY
TYPE
STATUS
Work Stream
LATEST VERSION
Product Leads
  • Andy Yates
  • Timothe Cezard
  • Nathan Sheffield
Staff Contact
Tools & Platforms

Community Resources

Dive deeper into this Product!All sequencing-based genomic analysis uses a genomic reference sequence — a baseline of knowledge against which variations are observed. There are multiple human reference sequences of increasing accuracy. For example, two organisations may refer to the same sequence using different names, or reuse names to refer to different reference releases. Reliable, reproducible genomic analysis depends on clear provenance back to reference data. refget enables access to reference genomic sequences without ambiguity from different databases and servers using a checksum identifier based on the sequence content itself.


#

Date

Title

Info

15 Feb 2023
Please review and provide your feedback for CRAM v3.1 and refget v2.0 by 14 March 2023.
#

Title

Info

Repeat

Day

Time

Duration

Working meeting focused on generating identifiers from reference sequences and specifying an API to retrieve sequences, sub-sequences, and metadata

Bi-Weekly
Wednesday
14:00 UTC
1 Hour
#

Date

Version

27 Feb 2023
9 Mar 2020
#

Title

Related Driver Projects and Organisations

European Joint Programme on Rare Disease (EJP RD)
ELIXIR, ELIXIR Beacon
ENA / EVA / EGA, EMBL's European Bioinformatics Institute (EBI), Centre for Genomic Regulation
#

Don't see your name? Fill out our form:

  • Jeremy Adams
    DNAstack
  • Shakuntala Baichoo
    University of Mauritius
  • Timothe Cezard
    EMBL's European Bioinformatics Institute (EBI)
  • Robert Davies
    Wellcome Sanger Institute (WSI)
  • Sveinung Gundersen
    Centre for Bioinformatics, University of Oslo
  • Reece Hart
    MyOme
  • Muhammad Haseeb
    EMBL's European Bioinformatics Institute (EBI)
  • Oliver Hofmann
    University of Melbourne Centre for Cancer Research
  • Rasko Leinonen
    EMBL's European Bioinformatics Institute (EBI)
  • John Marshall
    University of Glasgow
  • Nathan Sheffield
    University of Virginia

News, events, and more

Catch up with all news and articles associated with refget.

8 Jul 2021
GA4GH Standards in a Global Learning Health System
See more
5 Dec 2018
Using the GA4GH toolkit: refget API for retrieving reference sequences via checksum
See more
1 Nov 2018
GA4GH releases refget API for accessing genomic reference sequence data
See more