#CRAM4GH Twitter chat: recap

News

9 Apr 2019

#CRAM4GH Twitter chat: recap

9 Apr 2019

On Friday April 5, GA4GH held the #CRAM4GH Twitter chat. Guest “panelists” and experts James Bonfield, Thomas Keane, and Ewan Birney helped answer questions on the CRAM file format for genomic data compression.

On Friday April 5th, GA4GH held a live Twitter chat on the CRAM file format for genomic data compression. Leveraging the #CRAM4GH hashtag, the discussion featured guest experts James Bonfield, Principal Software Developer at the Wellcome Sanger Institute and lead CRAM maintainer, Thomas Keane, Team Leader at EMBL-EBI and co-lead of the GA4GH Large Scale Genomics Work Stream, and Ewan Birney, Director of EMBL-EBI and GA4GH chair, who answered questions from the community about the file format.

Visit the #CRAM4GH conversation reel to explore the full Twitter Chat, or view the highlights below.

CRAM is…

A file format that uses various algorithms to compress genomic data. By storing parts of a sequence that are different from a reference sequence, CRAM keeps files small and easily accessible.

To start the ball rolling… there has been confusion over the years as to what CRAM can and cannot do (even where it came from).

For more info, see “Cram dispelling the myths”:https://t.co/4vxB1VyYWQ #CRAM4GH

— James Bonfield (@BonfieldJames) April 5, 2019

CRAM’s impact extends beyond the genomics research community.

As genome sequencing becomes more routine, storing data efficiently and sustainably is essential. CRAM has immediate savings opportunities for the wider community:

Genome sequencing is well on the way to be coming a routine clinical assay. BUT genomes are big, e.g. a single human genome in BAM is ~100GB. With CRAM this can be reduced by 50-60%, so that’s immediate $$ savings and enables faster transfer of human data. #CRAM4GH

— Thomas Keane (@drtkeane) April 5, 2019

…to more than the genomics community. It is a fundamental step for implementing genomics in public health

— Mauro Petrillo (@PetrilloMauro) April 7, 2019

While requiring careful attention to data access controls and permissions, clinical genomics collaborators also stand to benefit by converting to CRAM. In response to question from @GeneFiddler, (Hywel Williams of Cardiff University) on the feasibility of adding CRAM support to clinical diagnostic pipelines, all three panelists emphasized interoperability already present in the system:

Good question Hywel. If the pipeline is predominantly using samtools/picard/GATK or their libraries (htslib, htsjdk) then it should be quite painless. If you have parts that don’t fit well, then it’s perfectly fine to use BAM and convert to CRAM at the end.

— James Bonfield (@BonfieldJames) April 5, 2019

Hi Hywel, CRAM is already compatible with some of the most highly used NGS tools, e.g. Samtools/htslib/htsjdk/GATK. I guess you would need to audit your pipeline to find out which tools are/aren’t already compatible. #CRAM4GH

— Thomas Keane (@drtkeane) April 5, 2019

They are probably using BAM now via GATK and Samtools. It should be an easy replacement but in a clinical context you’d need to do it carefully. I would audit the pipeline for the access of BAMs, check by hand each access works with CRAM #CRAM4GH >>

— Ewan Birney (@ewanbirney) April 5, 2019

Open standards ensure the ongoing integrity and quality of scientific data.

Like all GA4GH standards, CRAM is developed and maintained within an open forum, enabling greater collaboration and evolution of data analysis. Our panelists shared their reasons for supporting open standards and open software:

Many reasons. First it taps into the broad creativity of academic and commercial community. Second it enables science. Third it is transparent for one of the key data types we have as individuals #CRAM4GH

— Ewan Birney (@ewanbirney) April 5, 2019

With so many human genomes that will be generated across the world, we absolutely need open standards to ensure that the data can be analysed with the best algorithms. Open standards enable this to happen by ensuring interoperability. #CRAM4GH

— Thomas Keane (@drtkeane) April 5, 2019

I am a firm believer in freely available data for scientific research.

Data can only truly be free if the file I/O software is also free for all users, which in turn means the format itself has to be free of royalty-based patents. #CRAM4GH

— James Bonfield (@BonfieldJames) April 5, 2019

In the future, CRAM will be smaller and faster.

The current version of CRAM (V3.0) reduces disk space by 30-50% compared to BAM. Bonfield envisions even greater storage savings with future versions, as well as support for additional data types:

CRAM roadmap. 3.0 (now) -> 3.1 (archive mode; summer?) -> 4.0 (features; longer chromosomes, faster long-read support).

3.1 is expected 10-30% smaller than 3.0, depending on input data.

Eg see BAM v CRAM3 v CRAM3.1 for HiSeq2K and NovaSeq. (All lossless)#CRAM4GH pic.twitter.com/vZX4rRy8Eg

— James Bonfield (@BonfieldJames) April 5, 2019

Community members are invited to share ideas for CRAM V4 here:

James and the folks in the GA4GH file formats group have a public list of potential improvements coming down the pipe for CRAM v4. Again, all publicly documented and welcome contributions https://t.co/d9HBRLo0Oc #CRAM4GH

— Thomas Keane (@drtkeane) April 5, 2019

Learn more about CRAM at ga4gh.org/cram/.

Related Work Streams

Large-Scale Genomics (LSG) Work Stream

Latest News

24 Jun 2025

GA4GH and CRDSA agree to a Strategic Partnership

A doctor writing on iPad with health data and global connections coming out of the pen.

17 Jun 2025

Policy Brief: will the UK participate in the European Health Data Space?

A colorful strand of DNA set against images of a patient health record, a database, and a magnifying glass.

12 Jun 2025

GA4GH approves two new products: Categorical Variation Representation Specification (Cat-VRS) and Variant Annotation Specification (VA-Spec)

See all news and events

About us

About us

Strategic Road Map

History

GA4GH Inc.

Leadership

Funders Forum

Equity, Diversity, and Inclusion (EDI) Advisory Group

Staff

Our community

Our community

Organisational Members

Driver Projects

Strategic Partners

Assigned Experts

Individual Contributors

What we do

What we do

Study Groups

Work Streams

GA4GH Implementation Forum

National Initiatives Forum

Communities of Interest

Technical Alignment Subcommittee (TASC)

Calendar

Our products

Our products

Product Development and Approval Process

Implementations

Get involved

Get involved

Join us

Open calls

Implement a product

Attend an event

Become a funder

Subscribe to the GA4GH newsletter

Contact us

News and events

News

Events

Announcements

Publications

Podcasts

Videos

Newsletters

See all

News