News

GA4GH shares seven open source projects as part of Google Summer of Code 2021


For the seventh year in a row, the Global Alliance for Genomics and Health (GA4GH) is participating in Google Summer of Code (GSoC)—a global program that brings student developers into open source software development. 

As a global standards organization, GA4GH develops technical specifications to promote harmonization and interoperable analysis of genomic and related health information. An increase in open source tools and implementations drawing upon GA4GH standards will help the broader scientific community converge on common data models and processes, advancing research and clinical care.

The GA4GH community is excited to mentor students from all over the world who are interested in tackling current challenges in genomics. This year, the community will be focusing on the following seven projects:

    1. Pure Rust serverless htsget implementation: The GA4GH htsget API provides a protocol for securely accessing genomic and variation data. While the API is implemented in the Go programming language, this project aims to implement the htsget API in AWS Rust Lambdas, with the goal of demonstrating interoperability and facilitating diverse software options in the field.
    2. vcf2fhir for Structural Variants: The vcf2fhir converter is an open source utility for converting variants from the VCF format, maintained by GA4GH, into HL7 FHIR, an emerging standard for electronic health record interoperability. While the utility currently supports the conversion of simple variants, this project aims to enhance its capabilities by adding conversion of structural variants as well.
    3. React diagram component library for creating pedigree drawing tools: GA4GH is currently developing the GA4GH Pedigree Standard to model pedigree and family information for large scale genomic analysis. To complement the standard, this project aims to develop a React component library that can be used to produce pedigree diagrams based on data stored in a database.
    4. Implement GA4GH TES in Galaxy: GA4GH has recently approved the Task Execution Service (TES) API, a standard enabling federated and distributed computing of tasks across a network of participating compute centers. Galaxy is a popular platform for bioinformatics analysis that helps researchers run analysis workflows in the cloud. This project aims to add support for the TES API in Galaxy in order to increase interoperability between the Galaxy platform and GA4GH-compliant cloud solutions.
    5. Brokering Continuous Delivery through the ELIXIR Cloud service registry: A common problem in rolling out GA4GH service updates to implementations of GA4GH standards is the need for close communication between development and deployment teams. GA4GH Driver Project ELIXIR Cloud & AAI has set up a centralized service registry to track available deployments and provide lists of specific services requested by clients. This project aims to implement a decoupled, publish-subscribe-based continuous integration and delivery solution.
    6. Implement reusable GA4GH UI clients: A major obstacle to operationalizing GA4GH solutions is that various organizations are implementing their own web portals that access GA4GH microservices from the ground up, with little code reuse and contributions from external developers. In order to maximize community productivity and to allow small development teams to contribute meaningfully to this process, this project aims to provide clients with GA4GH microservices that are reusable, user-friendly “micro-frontends.” Specifically, the project will focus on implementing open source clients for the GA4GH Tool Registry Service (TRS) and Data Repository Service (DRS) specifications as Web Components.
    7. Development of a user interface for the Ensembl Variant Effect Predictor neXtProt plugin as one of the community tools hosted on the neXtProt portal: Several open source tools, such as the Ensembl Variant Effect Predictor (VEP), have been developed to predict the structural and functional effects of variants on genes, transcripts, protein sequences, and even regulatory regions. The VEP tool has numerous plugins including a neXtProt plugin, which integrates manually curated information from neXtProt in order to improve the accuracy of predictions. This project aims to develop a web-based user interface for the VEP neXtProt plugin and include it as one of the community tools hosted on the neXtProt portal.

To learn more about GA4GH’s GSoC projects, visit here. Student applications to apply to a GA4GH GSoC project closes on April 13. GA4GH does not endorse or claim ownership of any deliverables developed through the GSoC projects.