16 February 2018
“Genomics will not stop with As, Cs, Ts, and Gs,” says Marc Fiume, Co-Lead of the GA4GH Discovery Work Stream and CEO of DNAStack. “We’re going to have to integrate phenotypic and clinical information and run machine learning to extract insights; physicians will need to make diagnoses and prescribe precision medications; pharma companies will need to use genomics to understand the efficacy of their drugs.”
All of this, Fiume says, requires not just the generation of genomics data, but also the ability to process it in the cloud, connect it to clinical records, and use those discoveries in a systematic way to design a holistic solution for applying genomic medicine in healthcare.
This week DNAstack together with Canada’s Genomics Enterprise (CGEN), fellow GA4GH Member Organizations Google and the Centre of Genomics and Policy, and others launched Canadian Genomics Cloud (CGC): a national cloud-based infrastructure for genomics initiatives to share data across Canada.
The hope, Fiume says, is that the CGC will lay the groundwork for a future national precision medicine initiative by demonstrating the readiness of the Canadian genomics ecosystem to bring together high-powered cloud and sequencing facilities and integrating their systems to enable facile, secure data sharing, discovery, and exchange.
To that end, the CGC will be developing all of its solutions according to GA4GH standards. “The software will build upon a set of principles, many of which are shared by GA4GH. For example, we need to do this at scale and we need data and methods to be shared in the cloud.” Tactically, he says, this will look like a suite of GA4GH and clinical application programming interfaces (API) that work together to allow data to transition from the sequencer to the scientist and ultimately to the clinician.
In particular, the effort will align itself with the standards developed by the GA4GH Cloud and Discovery Work Streams. “We plan to continue to follow the trajectory of those new Work Streams, expecting that it will take about a year for new APIs to be stable and to have mature implementations at CGC.”
As announced earlier this week in the 2018 Strategic Roadmap, the Cloud Work Stream plans to develop a set of cohesive, interoperable APIs for virtually storing, analysing, and sharing data. The Tool Registry Service (TRS), Workflow Execution Service (WES), Data Object Service (DOS), and Task Execution Service (TES) APIs are designed to work together to allow researchers at disparate institutions to bring their analyses to data stored in the cloud rather than transferring these large datasets between institutions or around the globe.
The result, says the team, is “highly portable analysis code that ultimately enables ‘FAIR’ science, e.g. findable, accessible, interoperable, and reproducible tools, workflows, and datasets.”
The Discovery Work Stream is also working to make genomics data comply with the “FAIR” principles, however it is focused on developing APIs that make it possible for researchers at one institution to learn about data at a different institution. Together, the standards put forth by the Discovery Work Stream are intended to enable a global federated network of searchable genomic information.
As part of its mandate, the CGC will enable a national directory of shared data, which will build upon the standards of the Discovery Work Stream. “We’re keen to establish a mechanism for researchers to share data to make it analysable in different organizations,” said Fiume. “Data from Toronto should be analysable in Vancouver and vice versa. The CGC, built on top of GA4GH standards, will help to better connect scientists to data, tools, and each other, to help realize what we expect to be a watershed moment where these connections power systematic discoveries.” The CGC will also allow for collaboratories of disparate organizations to share data in a way that is both straightforward, secure, and transparent.