Read the 5-year vision statement of the work stream or read the full GA4GH Connect Strategic Plan.
The GA4GH Cloud Workstream (CWS) helps the genomics and health communities take full advantage of modern cloud environments. Its initial focus is on ‘bringing the algorithms to the data,’ by creating standards for defining, sharing, and executing portable workflows. Standards under discussion include workflow definition languages, tool encapsulation, cloud-based task and workflow execution, and cloud-agnostic abstraction of secure data access.
The CWS will build heavily on Docker for packaging of executables, and on existing text-based orchestration languages such as CWL and WDL for stitching those executables together.
The CWS will work with a variety of GA4GH Driver Projects including the NIH Genomic Data Commons, Genomics England, and other large-scale data processing efforts. These Driver Projects provide clear use cases for new standards, and deployment environments for specific implementations. As a result of our collaboration, the Driver Projects will have the ability to utilize standards to enable better tool, workflow, and data sharing with the larger community. Standards we push forward will address the following needs:
To ensure these standards meet the needs of GA4GH Driver Projects, the CWS will build a workflow portability testbed environment. Driver Projects will contribute one or two packaged workflows they care about, together with test input data and an output verifier. The CWS will then ask each Driver Project to run an instance of the testbed in their local environment, including contributing back patches to make it portable if needed, and to run all of the test workflows in all of their environments. Success will demonstrate the real-world usability and utility of Cloud Workstream standards.