GA4GH WES API enables portable genomic analysis

26 Oct 2018

The GA4GH Cloud Work Stream has announced version 1 of its Workflow Execution Service (WES) API — a protocol for running the same genomic data analysis in multiple cloud environments.

The GA4GH Cloud Work Stream has announced version 1 of its Workflow Execution Service (WES) API — a protocol for running the same genomic data analysis in multiple cloud environments. The announcement was made at the GA4GH 6th Plenary Meeting in Basel, Switzerland earlier this month.

“WES enables users to define workflows in a standard way, package them up, and then hand them to workflow engines that live in many different places,” said David Glazer, Engineering Director at Verily Life Sciences and co-lead of the Cloud Work Stream. “You should be able to run your workflow wherever you want on whatever data you have and be confident you’ll get the same answer — WES allows you to do just that.”

WES is part of a larger framework to seamlessly bring algorithms to genomic data rather than attempting to transfer that data across national and institutional bounds, which is cumbersome, resource intensive, and limited by regulatory constraints. “Our task as a work stream is to come up with the standards to facilitate the definition of these portable workflows, the sharing of them, and the execution of them,” said Glazer.

Taken as a whole, this framework will have positive impacts for both tool developers and tool users — developers will only need to package their tools once to make them available to the broad community, and researchers will have ready access to more tools as well as the ability to run compatible analyses across data in many places. WES addresses that last step in the framework — the execution of portable workflows in a preferred compute environment using data in a preferred storage environment.

In addition to developing WES, the Cloud Work Stream has also been spending its time since it was founded twelve months ago developing an interoperability testbed to show its framework works. “We want not only to demonstrate that the workflows are interoperable, but also to confirm they are actually successful,” said Brian O’Connor, Consulting Director at the UCSC Computational Genomics Platform and co-lead of the Cloud Work Stream.

The testbed depends on an “orchestrator” that sits between a library of workflows and a series of cloud environments where those workflows can be run.  In this role, it is responsible for directly testing workflows in different WES environments. It selects a workflow and then selects a cloud environment that has implemented WES where the analysis will be run. The orchestrator monitors the workflow along the way, spitting out details used by the Work Stream to confirm WES is working in each environment.

To date, four organizations have made their implementations of WES available to the testbed: Veritas Genetics (running on the Microsoft Azure cloud), the Human Cell Atlas (running on the Amazon Web Services cloud), the Broad Institute of MIT and Harvard (running on the Google Cloud Platform), and Illumina. The interoperability testbed shows that WES is working in all four cases by running test workflows from three GA4GH Driver Projects and collaborators — Human Cell Atlas, Australian Genomics, and TOPMed — along with another contributed by the PCAWG project. These workflows were coupled with the test data where the results of the analysis are already known, making it easy to verify workflow portability across sites.

“I can say, ‘Here’s a test set of inputs to run the workflow on; if successful, you should get these outputs’,” said O’Connor. “The orchestrator runs every test against every environment and we can see how well each combination fares.”

Over the coming year, the Cloud Work Stream will work to publish standards that complement WES in making it possible to run any tool on any data: an API for selecting tools (Tool Registry Service, or TRS) and an API for accessing data across multiple clouds (Data Repository Service, or DRS).

Related Work Streams

Latest News

8 Dec 2023
GA4GH Clinical & Phenotypic Data Capture Work Stream welcomes new Co-Lead Mónica Muñoz-Torres of the University of Colorado
See more
Geometric brain set against a binary code background
30 Nov 2023
Two leading standards bodies launch Neuroscience Community, powering a global data network that will speed up answers in autism, Parkinson’s, addiction, and more
See more
Four individuals are collaborating together
16 Nov 2023
Want to help shape guidelines for pandemic prep, schema consensus, sequencing metadata, and categorical variants? Join four new GA4GH groups!
See more