17 February 2020
Image credit: Stephanie Li, GA4GH
As individuals, we introduce ourselves whenever we meet someone new. We may exchange names, share what we do, and provide our contact details so we can stay in touch in. The Application Programming Interfaces (APIs) that power the web—such as the API for pulling information from Google Maps into the Uber app or for using Paypal to sell products on a website—also require this “handshake” process to find and connect with each other. But within genomic and health-related data sharing, searching for web services to work with certain datasets, let alone knowing how to access and connect with them, is not an easy feat.
The new GA4GH Service Info and Service Registry APIs make this process easier. Together, these minimalistic and light-weight APIs provide a standard format for listing genomics web services along with their metadata, including key attributes such as a unique identifier, a human readable name, the service host, and how to connect with it.
The new APIs are intended to be added as extensions to other specifications developed by the Global Alliance for Genomics and Health (GA4GH), such as the Beacon API for making genomic data discoverable on the web or the Data Repository Service for allowing researchers to share data in the cloud.
Service Info is an endpoint for describing GA4GH service metadata as part of another API; and Service Registry organizes services implementing Service Info into groups or networks. Simply put, Service Info is used to describe a single service, while Service Registry is used to describe multiple services.
“Like accents and dialects, each API and institution has its own take on how to communicate information about their services; meaning we lack a single and consistent way to describe capabilities. If we function this way, we must become polyglots or choose not to communicate at all,” said Andy Yates, Team Leader of the Genomics Technology Infrastructure at EMBL-EBI and product lead of the Service Info API. “The beauty of Service Info and Service Registry is their simplicity—they open up the service discovery process so that anyone can find a service for their study, using a standard simple format.”
When implemented together, a user can access the Service Registry API to discover available services, and then view more information based on what specific services announce via their Service Info endpoints. This helps facilitate service discovery across organizational boundaries without imposing communication restrictions on those services.
Image credit: Stephanie Li, GA4GH
“As the GA4GH ecosystem expands and we move towards a more integrated and federated system of services and APIs, we often find ourselves dealing with networks or groups of services. We need to manage these services in a consistent way, and Service Registry helps us tackle this challenge,” said Miro Cupak, Co-Founder and VP of Engineering at DNAstack and product lead of the Service Registry API.
ELIXIR, the European life sciences infrastructure, has implemented the Service Registry API within its ELIXIR Beacon Network—a series of ELIXIR Nodes that have implemented the GA4GH Beacon API to make their human genetic data discoverable. As a result, investigators can find data of interest by querying the ELIXIR Beacon Network as a single “registry of beacon services” instead of querying each Beacon individually.
“Solutions like the ELIXIR Beacon and the ELIXIR Beacon Network are already leveraging a concept almost identical to Service Registry and Service Info,” said Jordi Rambla, Team Leader at the Centre for Genomic Regulation (CRG) and a GA4GH Driver Project Champion of ELIXIR Beacon. “Having a community standard to share that information gives us the opportunity to integrate seamlessly with other solutions around the world.”
Rambla continued: “The ELIXIR Beacon Network provides deployable versions of products that implement these standards so that any interested institution—whether an ELIXIR member or not—can leverage them for their own deployments or development work.”
Taking a different approach, the Autism Sharing Initiative (ASI) has implemented the Service Registry API to expose which GA4GH APIs researchers can use to access ASI data.
“By introducing this standard, we will be sharing the GA4GH services currently implemented in our genetic database MSSNG,” said Dean Hartley, Senior Director of Genomic Discovery and Translational Science at Autism Speaks and a GA4GH Driver Project Champion of ASI. “We hope this will be just the beginning for building new tools and workflows to accelerate interoperability across different platforms and environments, with the intent to aggregate into larger datasets. This will be critical to ASI’s mission and, more importantly, to our understanding of autism.”
In the future, the team plans to extend the Service Info and Service Registry APIs across the suite of GA4GH standards and specifications to improve service discovery. They are also considering more advanced features for Service Registry, including methods to create, update, and remove services, functions to monitor the health of services, and ways to describe more complex network topologies.
“Finding data is hard,” said Cupak. “This challenge often starts with finding services and APIs that can be used to obtain the data. By having a standard way of representing and discovering these services, we can make data sharing much more effective at scale.”