OmicsXchange episode 8: HDR UK’s COVID-19 response — an interview with Caroline Cake

27 Jul 2020

On episode 8 of the OmicsXchange, we are speaking with Caroline Cake, CEO of Health Data Research UK, about the HDR UK response to the ongoing COVID-19 pandemic and their role in setting up the International COVID-19 Data Research Alliance.

Angela Page: Welcome to the OmicsXchange, I’m Angela Page. On today’s episode, we are speaking with Caroline Cake, CEO of Health Data Research UK, about the HDR UK response to the ongoing COVID-19 pandemic. They were also instrumental in setting up the International COVID-19 Data Research Alliance, which was launched earlier this month. To share a little more on our guest, Caroline started her career as a chemical engineer. She received an MBA from Harvard Business School before entering the health management industry in the UK. Welcome, Caroline.

Caroline Cake: Delighted to be here. Thank you for inviting me, Angela.

Angela Page: So, tell me a little bit about your role within HDR UK.

Caroline Cake: I am the chief executive of Health Data Research UK. I’m involved in delivering the strategy, helping design and set strategy, and ensuring that we as an organization are delivering on our mission, which is about uniting health data to enable discoveries and improve people’s lives. Across the UK, there’s a very fragmented environment within the NHS, within academia, within charities, so a number of different settings where there is data. So one of the things we’re doing is actually how do you bring that whole collection of people together? We’re not a data custodian ourselves, so what we do is enable and bring that group together, establishing standards and practices and ways of working. So we provide wrap-around activities to enable a researcher to work with that really fragmented landscape effectively

Angela Page: From your perspective, why is it important to share data during a global pandemic?

Caroline Cake: If you think about what’s been happening in COVID so far, as an example of a global pandemic, needing to rapidly know what’s happening, needing to rapidly know who’s affected, how they’re being affected, what population groups who is not being affected, what treatments are proving to work, what are not proving to work, what vaccines are working or not working, all of this requires actually a very large amount of data to tell you what’s actually going on. Data is an essential aspect of a national and global response to a pandemic.

Angela Page: So how is HDR UK specifically responding to COVID-19?

Caroline Cake: What we’ve done is work very directly with all of the sites. And so we do work associated with science, with training, with data establishing the data infrastructure in the UK to respond to questions and to enable the data. What we’ve done through COVID then is pivot all of those activities so they’re really focused on enabling the COVID response. We’ve also been working very directly with SAGE, the Scientific Advisory Group for Emergencies, in the UK, supporting the government on the scientific response. First of all, ensuring that we know what the health data questions are that need answering that require data, and that they’re prioritized, and we’re really clear on working on the ones that most urgently need answering first. Secondly, that the data assets are set up to enable answering those questions and people get access to them. And thirdly, that we’re seeing the outcomes and the insights generated from that health data research being communicated and rapidly getting to those answers so people know what’s been found, and what we are learning as we’re going through. So during the most urgent period, we were providing a weekly feed into it around health data research questions. So in doing those three things, we’re really creating this engine of health data research to really inform across a wide range of different questions.

Angela Page: What are some examples of the health data questions that you’re seeing emerge from the community?

Caroline Cake: We’re seeing a real coming together around the questions. So first of all, a sense of are there particular groups who’ve got an increased rate and risk of infection, so particularly on the Black, Asian, and ethnic minority groups, that has been a big focus, a big question. Another one has been about the impacts on vulnerable groups, so ones who may have already got cancer or cardiovascular disease, how they’re being affected directly by the disease but also indirectly by them, and also really then being able to understand what the most appropriate interventions are to protect them. This also goes into vulnerable groups associated with mental health as well, so a wide range of looking at the types of issues and who’s most affected. Then also connected to that there are questions associated with the knock on effect for health and care provision. So for example, cancer diagnostic treatments, how have those been delayed and impacted through the COVID period associated with lockdown happening or diverged activity associated with supporting COVID. There are then also a whole series of questions around effectiveness of treatments. So for example, working on clinical trials and being able to see through the data, what the impacts and effects are on that. So there’s a whole series of really, really interesting questions that have come out of this, that I think would also be useful for future pandemics as well.

Angela Page: One of the key pieces of HDR UK is the Innovation Gateway. Can you explain what this is, and how it’s involved in the COVID-19 response?

Caroline Cake: So the gateway is a really cool tool, in terms of if you imagine all the different data custodians have all their data, this provides a common portal so if you’re a researcher coming and you’ve got a question you want to know about cancer, or you want to know about COVID, you can just type that into the portal, and it will then return all of the datasets associated with that that exists. It will also return any research associated with that or return people are working on it, tools that you might be wanting to think and use. So the aim is to provide the person who’s wanting to answer these questions with all the resources you need to be able to quickly answer them. So say you select a dataset and say, “Okay, I’m interested in this one associated with cancer,” you can then click into it and it will tell you who the custodians are, but it’s also putting in place a standardized access process. It has a whole load of metadata so you can find out more about that dataset, what groups it represents, where it’s gathered from. It’s a whole series of information on it, but then you can actually click through into it and actually request access. And that’s then how that streamlined access process starts happening from there. It’s a means to actually empower the research community so that they actually can get the data they need fast, effectively, safely, securely. It’s got over 450 datasets on it at the moment, and adding continuously through from that. It was just established this year, so it’s very new and up and coming but it’s already got huge number of users already, a lot of datasets already accessible through it. So we’re just accelerating rapidly at the moment, bringing on more functionality as well.

Angela Page: So it sounds like having that streamlined approach to accessing data really is vital.

Caroline Cake: Data access is a very key component. It’s great to have data, but If you can’t access it, that’s a bit of a challenge. So one of the things we do is working with with GA4GH in terms of you using the thinking that has happened and the approaches that have happened to ensure we can actually establish consistent access processes, and then streamlining those processes so that we can get decisions made quickly, much higher quality requests coming through so that we’re losing less energy and time in navigating that process, and more time on doing the actual research and asking the great questions, and using the data. And that directly relates to COVID. And we’re integrating that into the gateway and the kind of common portal so that it’s automated into there so that it’s consistent across all the data custodians.

Angela Page: That’s a great example of data access standards being used in real-world practice to facilitate more efficient research. I just want to give a quick shout to  any of our listeners who are interested in getting involved in this work of standards development can join the Data Use and Researcher Identities Work Stream, which is managed by Melissa Konopko—and you can find her contact information on our website at if you’re interested. So Caroline, I remember hearing early on that HDR UK was embedding the GA4GH Framework into its Principles for Participation. So this Framework is the Responsible Sharing of Genomic and Health Related Data. Can you just tell us a little more about how it’s being used within HDR UK?

Caroline Cake: Yes. So we have this alliance of health data custodians, 32 of them in the UK. And every member of the alliance signs up to our principles for participation. Those principles have within that the Framework established within there. So every organization is required to behave in the way that is FAIR principles, how the data is securely held in a trustworthy way, but also ensuring that they’re orientating towards data sharing, rather than just data controlling.

Angela Page: I think that’s an interesting point that you’ve brought up, around trust – can you just expand on that a bit?

Caroline Cake: So we’re doing a lot in terms of using trusted research environments—where that data is being held, who can access it, this kind of concept of the five safes associated with have you got safe people, have you got it in a safe environment, is the project using data a safe project, so that you can be systematically sure that the data is being used in a trustworthy way. And also we work very closely with the public, patients and practitioners to really understand what people’s views are, what the concerns are, but also actively involving them in every decision associated with this. During COVID, we brought together a group of 62 people within the UK as a public patient advisory panel. One of the teams was working on a risk calculator associated with COVID and engaged the public advisory group to say actually what your thoughts were, testing with them as to how comfortable people are with this idea. So patient involvement engagement is tied into everything that’s happening, because we need to trust what’s happening here—so that’s a core part of it.

Angela Page: So switching gears a bit, at the beginning of our conversation, you mentioned how important it is for this work to be done on an international level. And we’ve been hearing about the International COVID-19 Data Research Alliance that you guys have been heavily involved in. Can you tell us what the purpose of this international alliance is, and how it’s making it easier to share, access, and analyze data?

Caroline Cake: This is working in partnership with a range of different organizations internationally so that researchers can work internationally on datasets, discover the datasets, join and link up data to really solve these problems. COVID is an international pandemic—solving it country by country seems slightly bonkers. So being able to actually bring together the different areas so that we are looking collectively across countries is what is happening on this. That is a big data challenge as to how to make that possible and how to do it. And that’s what this endeavor is actually to bring together the alliance of organizations who are partnering to make that possible and to develop the standards and ways of working on that, but also then having the workbench, which will be providing the kind of technical part of how the data is actually being accessed and used on there. And then that links through to the gateway they’re mentioned earlier, so the researchers can discover it. So the different components add up together so that as a researcher, you can start having a seamless way of actually working with this data. The gateway is agnostic to what the different disease groups are. So we’ll be adding more international components on in the future.

Angela Page: What is HDR UK’s role specifically?

Caroline Cake: So we’re playing a kind of coordinating convening leadership type of role within that of bringing together different partners. And then each organization bringing on the best characteristics and expertise. So from GA4GH, of all the great work you guys have been doing, how can we then just adopt, build on, and accelerate from that point of view. So it’ll be drawing together the different talents from the different organizations.

Angela Page: So how has HDR UK’s work contributed to the evolution of our understanding of COVID-19?

Caroline Cake: So if you look back at the early stages, you could see there was a kind of understanding initially that there seemed to be an age difference. There seemed to be a gender difference. We can see very quickly from the hospital settings were reporting a different profile of who was actually having the most severe outcomes. And you could see that from the mortality data pretty quickly. That then was able to turn into analysis, looking well, are we seeing a different ethnicity profile happening here? Some aspects were, and some aspects weren’t spotting it. So you could see there was a difference in analysis, depending on what methods were being used and what kind of sources they were drawing on. And then it became clear from the data that actually we needed to separate between who was infected, one group ethnicity aspect in there which seemed to be associated with more socio-economic factors. And then there was a second issue of them once infected, the severity of outcome was. So you could start seeing these two different parts of the data unfolding over time as the knowledge grew on that and people pursuing each thread further.

Angela Page: Historically, genomic data is not representative of the global population—it’s very much centered on people of European descent, as we know. How are we doing this time around with gathering diverse genomic and health data from COVID patients?

Caroline Cake: We just came off a call with our diversity and inclusion team in terms of actually the importance of diversity of the data being a very core part. Have we got all parts of society appropriately represented in different cohorts and different datasets? And are we gathering data consistently? So how we actually define ethnicity, how our understanding is different in different datasets. So I think it opens a whole series of questions around how we look to enhance and improve the quality of our data so that it truly is representative and we can cope with these sorts of questions in the future. So I think going forward, it will become a lot richer. I think there is an important aspect in terms of how cohorts are developed, how we ensure they are truly representative, how we work across different national and international boundaries. So for example, one of our colleagues is working very closely with teams in Bangladesh. So actually building a profile in different countries and things as well. So it’s really important as to how we collectively work together internationally on this as well.

Angela Page: And this just sounds like yet another reason why global collaboration really is so important and why the work of the International Alliance is critical.

Caroline Cake: It struck me though how we’re still very UK focused and answering the questions using UK data and it does feel that there is such an international opportunity to collaborate on. We could join up sooner faster and be ready to run in future pandemics. For data analysis and health data research, you need data at scale to get really good answers. We’re still in the early stages of COVID, that having too small of datasets, you are getting very different answers in thing. It does feel like a massive opportunity to work on the international stage.

Angela Page: So as we look ahead, how would you say that the work that you guys are doing now related to COVID-19 is helping to fuel your broader mission for health data sharing overall?

Caroline Cake: A very large proportion of what we do is now about COVID. It’s absolutely horrific what’s going on, but from a health data science point of view, it’s a really valuable, interesting area to demonstrate the impact that health data science can have. Most of what we’re doing is orientated towards that. So for example, the gateway—we need to build that functionality anyway, it’s just being applied to the COVID situation. The Alliance—we need to be building and developing those standards of trusted research environments associated with that. The data quality processes, we’re just applying it to COVID. So it’s accelerating the work rather than creating extra work.

Angela Page: It’s so important that HDR UK and the International Alliance are making these efforts right  now. I think that is a great note to end on, and I want to thank you so much for joining us today, Caroline. This really has been a fascinating discussion. Caroline Cake: Thank you very much!

Thank you for listening to the OmicsXchange—a podcast of the Global Alliance for Genomics and Health. The OmicsXchange podcast is produced by Stephanie Li and Caity Forgey, with music created by Rishi Nag. GA4GH is the international standards org for genomics, aimed at accelerating human health through data sharing. I’m Angela Page and this is the OmicsXchange.

Latest Podcasts

28 Feb 2024
OmicsXchange Season 2, Episode 1: Together for CHANGE: sequencing the African American genome with Lyndon Mitnaul and James Hildreth Sr.
See more
10 Jun 2022
OmicsXchange episode 17: exploring the Variation Representation Specification with Larry Babb and Alex Wagner
See more
27 May 2022
OmicsXchange episode 16: Atlas Variant Effects Alliance and interpreting human genetic variation — an interview with Clare Turnbull and Lea Starita
See more