Data Connect

Connects researchers to information on datasets regardless of how the data was originally stored or formatted

The ability to combine datasets from multiple projects and institutions all over the world can greatly accelerate scientific research. In the healthcare and biomedical research ecosystems, however, data generators store their data in a myriad of formats, structures, and locations. These technical differences have made it challenging for researchers to search, aggregate, and analyse data from multiple sources. Correspondingly, it is expensive for data providers to restructure existing data to fit the requirements of highly-structured, highly-opinionated sharing systems.

As a result, the data goes unshared and undiscovered. Developed by the GA4GH Discovery Work Stream, the Data Connect API aims to address this problem by providing a simple, flexible mechanism to connect researchers to information about otherwise disparate datasets, regardless of how the data was originally stored or formatted.

Jump to...

Benefits

  • Works with any data that can be serialised as an array of JSON objects
  • Supports federation
  • Can be implemented across a large variety of data stores

Target users

Researchers, data custodians, and developers

Image summary: Data Connect helps researchers compare information across datasets stored in otherwise incompatible file formats and database schema.
THEME
CATEGORY
TYPE
STATUS
Work Stream
LATEST VERSION
Product Leads
  • Miro Cupak
  • Jonathan Fuerth
Staff Contact
Tools & Platforms

Community Resources

Dive deeper into this Product!Data Connect is a general-purpose middleware for building federated, search-based applications. The API consists of two components. The first component allows data providers to describe the datasets they have and their underlying model using a common JSON schema. The second part enables researchers to write custom queries to learn more about datasets of interest. The two components are independent of one another, and one can implement the first piece without the second.


#

Title

Info

Repeat

Day

Time

Duration

Bi-Weekly
Wednesday
15:00 UTC
1 Hour
#

Date

Version

22 Jun 2021
#

Title

Related Driver Projects and Organisations

Autism Speaks, Autism Sharing Initiative
#

Don't see your name? Fill out our form:

  • Michael Baudis
    University of Zurich
  • Shu Hui Chen
    NIH National Heart, Lung, and Blood Institute (NHLBI)
  • Miro Cupak
    DNAstack
  • Marc Fiume
    DNAstack
  • Ian Fore
    NIH National Center for Biotechnology Information (NCBI)
  • Jonathan Fuerth
    DNAstack
  • David Glazer
    Verily
  • Dean Hartley
    Autism Speaks
  • Alice Mann
    Wellcome Sanger Institute (WSI)
  • Kathy Reinold
    Independent Contributor
  • Nara Sobreira
    Johns Hopkins University School of Medicine

News, events, and more

Catch up with all news and articles associated with Data Connect.

4 Mar 2022
Getting started with Data Connect
See more
11 Jan 2022
OmicsXchange episode 13: a conversation with the Data Connect technical team
See more
10 Jan 2022
Data Connect: A come-as-you-are approach to data sharing
See more