How we work

Cloud vision statement

Read the 5-year vision statement of the work stream or read the full GA4GH Connect Strategic Plan.

Motivation and Mandate

The GA4GH Cloud Workstream (CWS) helps the genomics and health communities take full advantage of modern cloud environments. Its initial focus is on ‘bringing the algorithms to the data,’ by creating standards for defining, sharing, and executing portable workflows. Standards under discussion include workflow definition languages, tool encapsulation, cloud-based task and workflow execution, and cloud-agnostic abstraction of secure data access.

Existing Standards

The CWS will build heavily on Docker for packaging of executables, and on existing text-based orchestration languages such as CWL and WDL for stitching those executables together.

Proposed Solution

The CWS will work with a variety of GA4GH Driver Projects including the NIH Genomic Data Commons, Genomics England, and other large-scale data processing efforts. These Driver Projects provide clear use cases for new standards, and deployment environments for specific implementations. As a result of our collaboration, the Driver Projects will have the ability to utilize standards to enable better tool, workflow, and data sharing with the larger community. Standards we push forward will address the following needs:

  • Defining portable workflows: tool builders need to be able to package their tools for reuse. The CWS will build on existing standards to allow workflows built by one researcher to be used by many others.
  • Sharing portable workflows: tool builders need to be able to offer their tools for others to use, and tool consumers need to be able to discover the tools they need. The CWS will support app-store-like functionality, including support for controlling access if tool builders choose.
  • Executing portable workflows: once a tool consumer has selected a tool, they need to be able to run it in their preferred compute environment, pointing at input and output data in their preferred storage environment. The CWS will define execution standards that will be easy for developers of existing workflow runners (e.g. Toil, Cromwell, Rabix) to support.

To ensure these standards meet the needs of GA4GH Driver Projects, the CWS will build a workflow portability testbed environment. Driver Projects will contribute one or two packaged workflows they care about, together with test input data and an output verifier. The CWS will then ask each Driver Project to run an instance of the testbed in their local environment, including contributing back patches to make it portable if needed, and to run all of the test workflows in all of their environments. Success will demonstrate the real-world usability and utility of Cloud Workstream standards.