2 December 2019
Genomics England has implemented the standard GA4GH API htsget to serve all of its genomic data from the 100,000 Genomes Program and the Genomic Medicine Service. The htsget standard has been developed by the Large Scale Genomics Work Stream of the Global Alliance for Genomics and Health (GA4GH). It is a genomic data retrieval specification that allows users to stream genomic data for selected subsections of the genome, so it is no longer necessary to download all the files in which the data resides. Genomics England is using the API to stream all of its VCF, BAM, and CRAM files, thereby providing direct access to the raw data for clinical care teams across the UK.
“As a streaming technology, htsget allows our systems to only move the data required for analysis,” said Augusto Rendon, Chief Bioinformatician at Genomics England. “Using a standard means that many downstream tools such as genome browsers or analysis pipelines can connect to the data without cumbersome intermediates.”
The htsget API is being consumed in several ways across Genomics England. Laboratory scientists can now more efficiently browse genomic data via software packages such as Genoverse and IGV.js. Morover, services are now built to call the htsget API directly. One such service is used for sample matching to confirm that the DNA held at the requesting lab is from the same individual as the genome held at Genomics England.
Because htsget offers out-of-the-box compatibility with samtools/htslib, it is very simple to integrate into other systems. Importantly, it helps to abstract services better by avoiding difficult dependencies with file systems.
“Implementing htsget into our ecosystem has been a very positive experience. It was very straightforward to integrate it with pieces of software that were already using samtools/htslib,” said Antonio Rueda, head of the interpretation platform at Genomics England. “This demonstrates the importance of converging on standard and unified protocols. Overall, this means that the labs using these data to interpret our patients’ genomes can simplify their workflows, deploy a wider range of tools, and hopefully diagnose more patients.”
In a hope that others can benefit from this standard, Genomics England has made the code open source. The implementation is available at https://gitlab.com/genomicsengland/htsget/gel-htsget