Data Connect

Connects researchers to information on datasets regardless of how the data was originally stored or formatted

The ability to combine datasets from multiple projects and institutions all over the world can greatly accelerate scientific research. In the healthcare and biomedical research ecosystems, however, data generators store their data in a myriad of formats, structures, and locations. These technical differences have made it challenging for researchers to search, aggregate, and analyse data from multiple sources. Correspondingly, it is expensive for data providers to restructure existing data to fit the requirements of highly-structured, highly-opinionated sharing systems.

As a result, the data goes unshared and undiscovered. Developed by the GA4GH Discovery Work Stream, the Data Connect API aims to address this problem by providing a simple, flexible mechanism to connect researchers to information about otherwise disparate datasets, regardless of how the data was originally stored or formatted.

Jump to...


  • Works with any data that can be serialised as an array of JSON objects
  • Supports federation
  • Can be implemented across a large variety of data stores

Target users

Researchers, data custodians, and developers

Image summary: Data Connect helps researchers compare information across datasets stored in otherwise incompatible file formats and database schema.
Work Stream
Product Leads
  • Miro Cupak
  • Jonathan Fuerth
Staff Contact
Tools & Platforms

Community resources

Dive deeper into this product! Data Connect is a general-purpose middleware for building federated, search-based applications. The API consists of two components. The first component allows data providers to describe the datasets they have and their underlying model using a common JSON schema. The second part enables researchers to write custom queries to learn more about datasets of interest. The two components are independent of one another, and one can implement the first piece without the second.







15:00 UTC
1 Hour



22 Jun 2021


Related Driver Projects and Organisations

Autism Speaks, Autism Sharing Initiative
Autism Sharing Initiative

Don't see your name? Get in touch:

  • Michael Baudis
    University of Zurich
  • Shu Hui Chen
    NIH National Heart, Lung, and Blood Institute (NHLBI)
  • Miro Cupak
  • Marc Fiume
  • Ian Fore
    NIH National Center for Biotechnology Information (NCBI)
  • Jonathan Fuerth
  • David Glazer
  • Dean Hartley
    Autism Speaks
  • Alice Mann
    Wellcome Sanger Institute (WSI)
  • Jimmy Payyappilly
    EMBL's European Bioinformatics Institute (EBI)
  • Kathy Reinold
    Independent Contributor
  • Nara Sobreira
    Johns Hopkins University School of Medicine

News, events, and more

Catch up with all news and articles associated with Data Connect.

Image of Nara Sobreira next to words "Welcome new GA4GH Co-Lead"
14 Jul 2023
GA4GH Discovery Work Stream welcomes new Co-Lead Nara Sobreira of Johns Hopkins Medicine
See more
4 Mar 2022
Getting started with Data Connect
See more
11 Jan 2022
OmicsXchange episode 13: a conversation with the Data Connect technical team
See more