Large Scale GenomicsLarge-Scale Genomics (LSG) Work Stream

Develops products to describe, compress, store, encrypt, and transfer genomic data in a scalable way.
Genomic sequencing generates data at an increasingly large scale. As public and national health systems continue to adopt genomic testing and more private companies launch genomic projects, the vast quantity of raw sequencing data will only balloon further. The Large-Scale Genomics (LSG) Work Stream produces robust, standardised ways to describe, store, and access this vital genomic information.

Jump to...

The LSG Work Stream develops efficient formats to store, access, and analyse sequencing reads, genetic variation, and gene expression information.
Image summary: The LSG Work Stream develops efficient formats to store, access, and analyse sequencing reads, genetic variation, and gene expression information.
Contribute to this Work Stream

Subscribe to receive meeting invitations and real-time announcements.


Join the Work Stream
Technical description
Produces standardised file formats and remote access protocols for storing, compressing, encrypting, querying, and sharing genomic data at scale.
work stream leads
  • Geraldine Van der Auwera
  • Oliver Hofmann
staff contact
TOOLS & PLATFORMS

Products

Community Resources

Dive deeper into our Work Stream! LSG produces standardised methods for storing, accessing, and analysing genomic data (reads, variants, and expression data) on a large scale. For remote queries, the Work Stream also develops standards for file-based, API-based, cloud-based, and distributed access.


#

Date

Title

Info

17 Jul 2023
12 May 2023
Tell us what you think!
15 Feb 2023
Please review and provide your feedback for CRAM v3.1 and refget v2.0 by 14 March 2023.
27 Oct 2022
Please submit feedback by Thursday, 1 December 2022, at 17:00 UTC.
#

Title

Info

Repeat

Day

Time

Duration

This group meets to discuss all GA4GH File Formats maintained by the Large-Scale Genomics Work Stream: SAM/BAM/CRAM and VCF/BCF.

Every Two Months
Tuesday
00:00 UTC
1 Hour

This group meets to discuss all GA4GH File Formats maintained by the Large-Scale Genomics Work Stream: SAM/BAM/CRAM and VCF/BCF.

Every Two Months
Tuesday
20:00 UTC
1 Hour

This group meets to discuss scaling issues related to the Variant Call File format, VCF.

Every Two Months
Monday
13:00 UTC
1 Hour

This group meets to discuss scaling issues related to the Variant Call File format, VCF.

Every Two Months
Monday
21:00 UTC
1 Hour

Working meeting focussed on storing and accessing data in encrypted and authenticated states

Monthly
Tuesday
14:00 UTC
1 Hour

Working meeting focused on focused on Rust implementation work and benchmarking implementations.

Meets every eight weeks on a Tuesday at 10:00pm BST. Alternates with the other htsget meeting, which takes place every eight weeks on a Wednesday at 5:00pm BST.

Every Two Months
Tuesday
22:00 UTC
1 Hour

Working meeting focussed on focussed on Rust implementation work and benchmarking implementations.

Meets every eight weeks on a Wednesday at 5:00pm BST. Alternates with the other htsget meeting, which meets every eight weeks on a Tuesday at 10:00pm BST.

Every Two Months
Wednesday
17:00 UTC
1 Hour

Working meeting focused on generating identifiers from reference sequences and specifying an API to retrieve sequences, sub-sequences, and metadata

Bi-Weekly
Wednesday
14:00 UTC
1 Hour

Working meeting focused on endpoints for search and retrieval of processed RNA data.

Meets on the third Wednesday of the month.

Monthly
Wednesday
17:00 UTC
1 Hour
#

Don't see your name? Get in touch:

  • Jeremy Adams
    DNAstack
  • Shakuntala Baichoo
    University of Mauritius
  • Dixie Baker
    Martin, Blanck and Associates
  • Michael Baudis
    University of Zurich
  • Edmon Begoli
    Oak Ridge National Laboratory (ORNL)
  • Nicolas Bertin
    Genome Institute of Singapore
  • James Bonfield
    Wellcome Sanger Institute (WSI)
  • Guillaume Bourque
    McGill University / Université McGill
  • David Bujold
    McGill University / Université McGill
  • Daniel Cameron
    Walter and Eliza Hall Institute of Medical Research
  • Timothe Cezard
    EMBL's European Bioinformatics Institute (EBI)
  • Shu Hui Chen
    NIH National Heart, Lung, and Blood Institute (NHLBI)
  • Guy Cochrane
    Independent Contributor
  • Robert Davies
    Wellcome Sanger Institute (WSI)
  • Richard Durbin
    University of Cambridge
  • Yossi Farjoun
    Lady Davis Institute
  • Mallory Freeberg
    EMBL's European Bioinformatics Institute (EBI)
  • Kais Ghedira
    Institut Pasteur de Tunis
  • Romain Gregoire
    Canadian Centre for Computational Genomics
  • Roderic Guigo
    Centre for Genomic Regulation
  • Sveinung Gundersen
    Centre for Bioinformatics, University of Oslo
  • Yosr Hamdi
    Institut Pasteur de Tunis
  • Reece Hart
    MyOme
  • Muhammad Haseeb
    EMBL's European Bioinformatics Institute (EBI)
  • Michael Hoffman
    Princess Margaret Cancer Centre
  • Oliver Hofmann
    University of Melbourne Centre for Cancer Research
  • David Jackson
    Wellcome Sanger Institute (WSI)
  • Thomas Keane
    EMBL's European Bioinformatics Institute (EBI)
  • Jerome Kelleher
    University of Oxford
  • Rasko Leinonen
    EMBL's European Bioinformatics Institute (EBI)
  • Anders Leung
    Independent Contributor
  • Mike Lin
    Independent Contributor
  • John Marshall
    University of Glasgow
  • Emilio Palumbo
    Centre for Genomic Regulation
  • Martin Pollard
    Wellcome Sanger Institute (WSI)
  • Shaikh Farhan Rashid
    University Health Network, Canadian Distributed Infrastructure for Genomics (CanDIG)
  • Emilio Righi
    Centre for Genomic Regulation
  • Alexander Senf
    Congenica
  • Nathan Sheffield
    University of Virginia
  • Albert Smith
    University of Michigan
  • Jing Su
    Wellcome Sanger Institute (WSI)
  • Sean Upchurch
    California Institute of Technology
  • Roman Valls Guimera
    University of Melbourne Centre for Cancer Research
  • Zhenyu Zhang
    University of Chicago