29 June 2020
Angela Page: Welcome to the OmicsXchange, I’m Angela Page. Today we continue to discuss the role of data sharing during the COVID-19 pandemic, speaking with members of the international genomics community on initiatives that leverage collaboration, interoperability, and open science to advance research into the novel coronavirus. Today we speak with Kathi Lauer, a virologist and Industry Officer for External Relations at the ELIXIR Hub in the UK, about ELIXIR’s efforts to respond to the pandemic. Kathi originally trained as a virologist and vaccine designer, and she was involved in the response to the 2015 ebola outbreak in Sierra Leone. Due to her backgrounds in public health and industry stakeholder outreach, Kathi now coordinates ELIXIR’s COVID-19 response throughout Europe. Welcome, Kathi.
Kathi Lauer: Thank you very much for the invitation.
Angela Page: So I know you have a background in viral genomics and you participated in Public Health England’s response to the 2015 Ebola outbreak. From your experience, why do you think it’s important to prioritize data sharing during a pandemic?
Kathi Lauer: So in general, I would say that open science and data sharing help to catapult scientific discovery forward. What is different in an outbreak scenario with an unknown pathogen is the immediate urgency to react. And of course, no one can do that single handedly. And in science things are always scarce and hard to come by: skilled personnel and specialists and funding. But pulling together and sharing means that we can achieve solutions more efficiently. Data is the basis for how we react to a pandemic in terms of diagnostics, developing the treatment or preventive measures, understanding transmission, or for example, developing informed policies, and this ultimately translates into saving people’s lives. In the current Coronavirus outbreak, vast amounts of new data are being generated, and this is due to advancements in technology such as sequencing. These data need to be analyzed and also put in context within pre-existing data so they can be shared for solution-centered research.
Angela Page: ELIXIR is playing a large role in advancing COVID-19 research and data sharing—both in Europe and around the world. For those who aren’t familiar, can you provide a quick intro to ELIXIR?
Kathi Lauer: So ELIXIR is an intergovernmental organization with 22 member countries, plus the European Bioinformatics Institute. And we currently have about 220 bioinformatics institutes involved. What we try to do is to coordinate this bioinformatics community and really streamline their efforts. And through those affiliated institutes, we can provide services that fall into the areas of databases, standards, analysis tools, workflows, compute in cloud, interoperability resources, as well as training. This infrastructure was put in place over five years ago, and since then, has brought together many experts in the data-driven life sciences. And they’ve worked collaboratively on projects, and many of these projects have to do with responsible sharing of data.
Angela Page: So why don’t you tell us a little more about ELIXIR’s strategy and efforts to facilitate COVID-19 data sharing.
Kathi Lauer: So the big advantage we had in a situation such as this is, is that we already had the infrastructure and the network in place. In our initial response to COVID-19, we focused on compiling services from across our members with a specific offering for research on COVID-19. And we expand on them as they evolve. What we see at the moment is that most European countries are planning to collect human sequences and associated phenotypical and clinical outcome data in relation to COVID-19; for example, to inform genetic association studies for host factors that determine susceptibility or severity of disease. These datasets need to be stored in a secure and GDPR compliant manner. So there’s this really urgent demand for data sharing for human host genetic and phenotypic data. ELIXIR’s strategy moving forward supports convergence of those member states into what we call interconnected European data hubs for COVID-19 research. These data hubs are building on the infrastructure that was developed by the European bioinformatics Institute. So, in April 2020, they’ve launched their COVID-19 Data Portal, which brings together relevant datasets for sharing and analysis. An example of how we facilitate this is through the European Genome-Phenome Archive (EGA). This is a resource for secure archival and access to potentially identifiable genetic and phenotypic data. We know that much of the COVID-19 hosts data will come directly from clinical care, and it will likely be subject to national laws and regulations and much of the data is very unlikely to be consented to leave the national jurisdictions. So as part of our COVID-19 response, ELIXIR will accelerate the implementation of a federated European Genome-Phenome Archive, and really provide the necessary infrastructure and coordination between European national initiatives and therefore ensure that sharing of COVID-19 host data can be guaranteed in full compliance with national regulations.
Angela Page: As you described, facilitating coordination among countries is and will be key to advancing COVID-19 research. Why is global collaboration important during a pandemic?
Kathi Lauer: Research is always a collaborative effort, and the easier is it to reproduce and confirm results, the more we can rely and trust results and transform them, for example, into policies. Open access to data and tools really allows innovators in the public and in the private sector to provide solutions, as we have seen in the past months, for example repurposing of already existing drugs, development of tracing apps, et cetera et cetera. Actually I have a really great example to share and highlight the value of open access and collaboration across borders. So in February, a paper was published in BioRxiv by a group of scientists. This group of scientists took four publications—describing genomic features of COVID-19—and tried to reproduce the results on public infrastructure. The analyses were performed using only free software and deployed on Galaxy platforms. So Galaxy is an open web-based platform for computational biomedical research. It allows researchers without a vast amount of programming experience to run data analysis workflows on that data, share this analysis with others, and really enable others to repeat the same analysis. These Galaxy platforms were in the US, Australia, and two platforms operated by ELIXIR nodes, ELIXIR Germany and ELIXIR Belgium. It’s really great to see how this effort was possible, and it’s building on many many years of collaborative work.
Angela Page: So you mentioned that the Galaxy platforms are powerful tools for running analyses in the cloud–how has cloud computing impacted research during this time?
Kathi Lauer: One massive advantage that really helped many researchers is that the cloud allows to shift to digital solutions and retain productivity and we’ve all experienced that by working from home in the last couple of months. Cloud computing is really becoming the norm for creating environments for sharing tools, workflows, and data repositories in bioinformatics, including now COVID-19 research. This was, for example, successfully demonstrated during a virtual bio hackathon that was focused on COVID-19 and which was facilitated by ELIXIR. Organizations like the GA4GH, play an important role in agreeing and defining shared cloud standards for such cloud based tools, workflows, and compute as well as agreeing standards to manage identity and access to such cloud based resources.
Angela Page: An important aspect you’ve mentioned several times now is the need for standards—can you elaborate on that?
Kathi Lauer: Oftentimes, we are not really stuck with the technology; we are stuck because we cannot agree, for example, on standards. And in this regard, FAIR plays an incredibly important role— so FAIR stands for fundable, accessible, interoperable and reusable and they are all cornerstones of data sharing. We’ve started an IMI project called FAIRplus last year, and that really addresses how people can “FAIR-ify” their data. And at the moment, they’re creating a FAIR Cookbook with recipes to FAIR-ify your data. So the idea is that people can just take this FAIR cookbook off the shelf and don’t have to even really think about where to start because the recipe is there. And this is again where collaborations and trust helps to pave the way. And strategic partnerships such as the one with GA4GH are of utmost importance in a pandemic like this, where we need to agree on things fast. An infectious disease just doesn’t care about borders; having a network of global contacts helps to act rapidly and efficiently and in a scenario like this, trust has very high priority.
Angela Page: In addition to the COVID-19 Data Portal and the Galaxy Platforms deployed by the ELIXIR nodes, what other standards and tools have ELIXIR developed to promote data sharing?
Kathi Lauer: The COVID-19 disease map is led by ELIXIR Luxembourg and has many many many contributors. The last I spoke with them it was about 196 contributories. I have mentioned this before because these links between those different ELIXIR members already existed before the pandemic. It was very, very easy for the initiative to quickly build on existing synergies and was able to generate this resource. This resource allows visual exploration and computational analysis of molecular processes involved in SARS and Coronavirus to entry; so replication, host pathogen interactions, or host cell recovery and repair mechanisms.
The COVID tools registry is built on this service called bio.tools; this is a registry of software and databases, including command line tools and online services and through to databases, complex multifunctional analysis, workflows, etc. everyone can add their service to this registry and expose it to a wide range of users. So in response to COVID-19, a specific target has been created to highlight services of relevance to COVID-19 research. The Workflow Hub is part of the European Open Science Cloud; it is a workflow registry which is currently under development., and it is based on common workflow language research objects to tie in federated workflow and tool descriptions across different research infrastructures. So the aim is really to support workflow discovery, reuse, preservation, interoperability using standard protocols.
Angela Page: How do you expect ELIXIR will leverage learnings from the COVID-19 pandemic for future outbreak response?
Kathi Lauer: What we’re really aiming to do is to develop a sustainable long term solution for pandemic preparedness, by really anchoring services in already existing structures built for the future. Moving forward, the higher level objectives that we have are providing access and link data between national infrastructures and resources. We want to give access to cloud and storage resources for COVID-19 research and the collaborative development of open, reusable, and reproducible computational workflows. So what this means in practice is to develop and coordinate these interconnected European data spaces that really bring together National Health Informatics infrastructure, cloud computing infrastructure, and the wider European infrastructure. And this interconnected ecosystem will then allow data from ongoing European projects, as well as many refocused national research programs to be widely shared and used. So ELIXIR’s response is very much aligned with the European Bioinformatics Institute action plans for COVID-19, where there is a significant current investment in these COVID-19 data hubs and for sequence sharing. And I have mentioned the federated European Genome-Phenome Archive for transnational access of COVID-19 patient data, we hope to scale that up across all our other member countries. Moving forward, we want to foster data management practice to make COVID-19 data open, fair and long-term reusable. And we want to provide training in this space as well.
Angela Page: What do you think will be important for us as a global community to focus on moving forward?
Kathi Lauer: I’m blown away by how people really went the extra mile, to put resources together to move science forward and trying to save lives. But what really made it easy for us is that we already have this infrastructure in place. These people have worked with each other for many, many years. And I think it is important to realize that investment in those infrastructures is really important when it comes to a situation like this. Of course, it’s a steady learning experience, because we’ve never been in a situation like this before. But everything we do in ELIXIR, has definitely a sustainability plan as well. We want to make sure that we set up for pandemic preparedness in the future.
Angela Page: Well, this has been a great discussion, thank you so much for being here, Kathi.
Kathi Lauer: Thank you very much.
Thank you for listening to the OmicsXchange—a podcast of the Global Alliance for Genomics and Health. The OmicsXchange podcast is produced by Stephanie Li and Caity Forgey, with music created by Rishi Nag. GA4GH is the international standards org for genomics, aimed at accelerating human health through data sharing. I’m Angela Page and this is the OmicsXchange.