OmicsXchange Episode 16: Atlas Variant Effects Alliance and interpreting human genetic variation with Clare Turnbull and Lea Starita

Angela Page 0:05  

Hello, and welcome to the OmicsXchange. I’m Angela Page. Today, I’m speaking with Dr Clare Turnbull, [a] professor at the Institute for Cancer Research and Dr Lea Starita, an assistant professor at the University of Washington, about the Atlas Variant Effects Alliance and interpreting human genetic variation. Welcome, Lea and Clare. Can you introduce yourselves and tell me, you know, how you are involved with AVE and also, just for my knowledge, how aware you are of GA4GH and what’s going on here?

Clare Turnbull 0:38  

Sure, so my name is Clare Turnbull. I’m a professor of translational cancer genetics at the Institute for Cancer Research in London. And I’m also [an] NHS consultant, [a] clinical geneticist at the Royal Marsden Hospital, and [an] Honorary Consultant in Public Health at Public Health England, say my kind of jobbing clinical role is seeing patients who have a pattern of cancer in themselves and/or their family that looks to have a genetic basis, and undertaking the relevant types of analysis of genetic testing and returning the results to them.

But I also, as part of my research role, I have funding from Cancer Research UK, that supports [the] work I do in conjunction with Public Health England, collecting data from all the labs across the country, from their historic and current testing of cancer susceptibility genes. And as part of this work, I lead a national group where all of the labs from across England, Scotland, Wales, Northern Ireland, and the Republic of Ireland, we meet once a month, and it’s the Cancer Variant Interpretation group. And we try to improve the standard and consistency of interpretation of variants in cancer susceptibility genes. And that is what’s led me to a very strong interest in working with scientists such as Lea and colleagues in ensuring the development of MAVES and even more importantly, translation of the findings, and accessibility of findings to the clinical diagnostic community.

Lea Starita 2:32  

Hi, I’m Lea Starita and the assistant professor of Genome Sciences at the University of Washington in Seattle. I’m also the co-director of the Brotman Baty Advanced Technology Lab at the Brotman Baty Institute for Precision Medicine. And so, Clare just introduced this term MAVE, which is multiplexed assay of variant effect, and I try to develop new technologies so that we can access new phenotypes to try to understand the effect of genetic variation on the function of proteins, or enhancers, or promoters, or any sequence of the genome.

Angela Page 3:09  

Okay, fabulous. Thank you. So, we’re here to talk about the Atlas of Variant Effects, which is an international alliance to bring together data, variant interpretation data, but specifically using this multiplexed assay technology. So, can we start there? And just have you kind of explain to us what this technology is, and how it’s different? Why this sort of requires its own consortium?

Lea Starita 3:35  

[So], I think the reason it might require its own consortium is because the genome is a very big place. And so, you know, no one lab, no one group could, with current technology, understand many of the changes in the genome. And so part of the problem is, is that you know, sequencing has become more widespread, especially in the clinic. And what happens is, it turns out that we are much better at reading genomes than understanding what we find.

So, you know, we sequence a gene or a panel of genes, and then, you know, humans like to be very different from each other. And we’re finding all this rare variation, so rare coding, rare changes in the genome, and they could be in coding regions. And even those, which should be kind of easier to predict, and in the 1% of the genome that might cause the biggest effects, we really still just are almost clueless when it comes to understanding… the effect of a new variant that we have found a priority.

And so one of the ways to figure that out is to actually make that variant and test whether or not, you know, the protein is still functional or the promoter is still functional. or something like that. And again, because the genome is such a big place, and there [are] many genes, and many promoters and many different people have different expertise [on] different genes, you know, [and] we’re trying to gather all of those people to do these sorts of experiments.

And these experiments are important because it’s one of the most trustworthy ways to get a measurement of variant effects. And what we mean by multiplexed, right, so again, the genome is a big place, even each gene can — right, so if you take a gene that was 100 amino acids, that’s 300 bases, that’s 900 mutations, possible single-nucleotide mutations, right, so each base, you could change to three other bases. So that’s, that would be 900 mutations, you would have to do if you wanted to understand the effect of any possible variant gene. That’s a lot of postdoc years if you’re going to use those, do that one at a time, right. And so these multiplexed assays, what we do is we try to make every possible variant, and hook these assays up to sequencing [is] so that the genotype is linked to the phenotype so that we can use sequencing as a readout. And in that way, we’re able to measure the effects of, like, hundreds of thousands of variants in a single experiment.

Clare Turnbull 6:24  

As a user of these data rather than a generator of them, I can only speak to how transformative it was when Lea and colleagues published what was one of the first MAVE publications. And suddenly, we had reliable functional data on nearly every possible substitution that could arise in the key regions of BRCA1. This completely rocked our world in terms of how we classified variants in this gene.

And prior to applying the types of technology that Lea has described — as she said, because functional assays were so low throughput, most publications only described a handful at most, I don’t know, 20, 30, 40 variants. So, they’d pick out a handful of, you know, kind of ones that have been detailed as problematic in the clinical community. And then they’d run their low throughput assay and publish on those 30 or 40 variants. So then on up on the clinical side, if you have a clinical variant that you’ve found, and it’s uncertain, you’d be looking at up and going on, you know, it’s not in those data, there’s no functional data on it. And so it was very, very, very patchy, the data that were available. And certainly, in terms of the methodology, there was much less rigorous methodology in terms of, you know, the true positives and true negatives used in the assay. Often these assays were not reproducible in the hands of different groups, and so forth. So it really was a little bit of a sort of a wild west and pretty patchy in terms of the availability of data.

And then Lea and colleagues published [a] BRCA1 assay, which was absolutely sort of transformative in terms of having those types of systematic, well-quantifying data available to us clinically, with very systematically documented evidence of the two positives and two negatives. And subsequently, there have begun to emerge a small number, but a rapidly growing number, of similar assays across different genes. And I think that’s why the AVE has come into being: recognizing, firstly, the potential of these assays for the clinical diagnostic community in terms of reliable data by which to classify variants, but also the potential volume and complexity of these data, and the need to work in harmony internationally to ensure that we have standards, that we have consistency, and that we have sort of portals and hubs by which the data can be stored and made available. 

Clare Turnbull 9:25  

In the fullness of time, we will be able to gauge which MAVES are perfect predictors of clinical pathogenicity. And hopefully, if one were to fast forward 10 years, there will be some perfect assays, which perfectly informed clinical pathogenicity. At the moment, it’s a bit of a sort of iterative benchmarking between different types of data. And therefore, trying to use different types of orthogonal data to help us inform the veracity and validity of the MAVE data, i.e., how good a predictor is it of that particular clinical pathogenicity, for that particular gene. So, yes, ideally, in the fullness of time, we will identify which MAVES are most informative. And we can increasingly add weight to those data in terms of our variant interpretation, that we’re in the sort of interim part of the curve, where we’re just figuring out how to quantify that. So the other types of data remain very important.

Lea Starita 10:53  

To add to that with ClinGen — and I think ClinGen takes, you know, variant interpretation and clinical genetics, they’re a very broad group — we’re kind of generating one small piece of data that would feed into ClinGen. I mean, it’s not small. We’re kind of very focused on generating the data, validating the data and interpreting the data, you know, feeding that into groups like ClinGen, whereas they’re, you know, they’ve got a very broader, much more broad view of all of the aspects of clinical genetics. 

Clare Turnbull 11:28  

And so just to follow on from what Lea said, ClinGen, in terms of working in the ACMG variant interpretation framework, there’s a sort of central ClinGen group called the SVI. And then there are gene-specific groups called VCEPs. And then there’s this interaction of the VCEPs developing [a] sort of gene-specific guidance, and the SVI developing generic guidance. And again, a bit of an iteration and a dance between those two, in order to assume that the generic bits of the gene-specific guidance are consistent with each other, and then that these inform the trajectory of the central SVI. And I think as Lea said, the AVE endeavour is taking one part of that, i.e., large multiplexed functional assays, and those in the community who are experts in that area, and seeking to evolve useful work around AVEs that would then feed into the totality of what ClinGen do and how they interact with the ACMG guidance.

Angela Page 12:42  

So, you’ve pulled together a pretty broad international community. And I’m wondering if you can talk about how that has gone, why international collaboration is so important in this particular case, and sort of just what your experience is pulling this community together?

Lea Starita 12:57  

Yeah, so, the idea for the Atlas, started really in I think about 2018. So I think this moved actually fairly quickly. And, again, I think the reason for kind of building this big tent is because, you know, we need a lot of areas of expertise. So you know, the molecular and cellular biologists who understand, you know, what the phenotypes are at the cellular level, and we can do these sorts of experiments; and data scientists who can, you know, kind of bring all this data together and help us understand it; and then clinicians like Clare, who are the end-users of the data, and who can say, well, you know, you guys should do it this way. Because sometimes from the bench, we don’t understand exactly how somebody like Clare would use the data and how she would like to see it presented in order for it to be the most useful it can be, and easiest to use. Right.

And so that’s a lot of areas of expertise. And then, of course, you know, I think the international aspect is just trying to, you know, we don’t want to be kind of, you know, either US-centric or UK-centric, because I think that cuts off a lot of where the expertise can be found. And so just gathering as many people as we can. 

Clare Turnbull 14:24  

Yeah, I mean, I think there is no reason that a variant interpreted in France, the interpretation should differ to that in the States, to that in Singapore, to that in Australia. And therefore I think in the AVE consortium, we’ve tried to bring in as many people to represent the different countries, as well as the different specialities. And it’s still, that’s a hub and spoke and then you want more spokes going out. You do what people brought in to using available data, believing in available data, feeling confident and understanding in it, and feeling engaged in the groups of people who have delivered it, and also delivered the guidance around how it should be used. So I think having a broad tent, both in terms of speciality, but also in terms of geographies, is important.

Angela Page 15:18  

Just leveraging the sort of global capacity that’s out there, because it is such a challenge.

Clare Turnbull 15:23  

I mean, it’s also in reality, because we’re increasingly global, that I’ll get a patient referred to my clinic who’s had genetic testing elsewhere in the world. And their aunt has been told that this is a pathogenic mutation. And then, on occasion, we have to unpick that in our diagnostic labs, and there will be occasions that we may not agree with that. And therefore, the more that we are using the same available data, using the same sets of rules, the more likely we are to be consistent in our messaging to families who share parts of the genome, and therefore we would like to be able to give them consistent information about the genetic changes that they share.

Angela Page 16:08  

Right. So that’s fabulous, I’d love to move on to that question, exactly. Like, what is the standards development activity looking like at AVE? And how do you think it relates to what’s going on at GA4GH? Is there room for, you know, for crosstalk or collaboration at all?

Lea Starita 16:22  

Definitely, and I, you know, there [are] standards that, you know, begin, you know, there’s standards around a lot of different things, right. So one of which is, you know, kind of how these experiments are done, right: as sequencing gets cheaper, it’s easier to do a lot of replicates. So you can start really understanding the variation in the experiments and understanding how trustworthy each measurement is. And that’s something we would like to kind of standardise.

And then obviously, the next thing is, you know, sharing all of the data and where the raw data goes, and what the minimum information is for publishing this, these, these datasets, you know. Even now, every now and again, you’ll get a paper where the raw data is not available. And you know, that’s kind of not acceptable, especially when somebody like Clare might want to use this data. She’s gonna want to know how trustworthy it is, right? Because it’s actually going into clinical translation. And that’s very important to us.

And then, of course, there’s the Clinical Variant Interpretation Working Group, which Clare and I are co-chairs of, and that’s, you know, formalising the way in which this data is actually translated from a functional measurement in a random cell line to the actual evidence that would be applied to say, the ACMG guidance, right, or the ACMG framework for variant interpretation. So we want to make sure again, as Clare just said, that this is kind of standard across the field, across the world. So that people aren’t coming up, using the same data and coming up with different answers.

Clare Turnbull 18:04  

Certainly, I think there’s a lot of commonality in regard to some of the elements of work that the GA4GH have progressed. And I think AVE obviously have an eye on that so that they’re not duplicating, particularly in regard to data formats and varied nomenclature, and those, those, the rigour in terms of ensuring that we name and store the data by which we describe a particular variant in a way that’s consistent across the MAVE community, but also more broadly consistent with the genomics community.

So I think those elements that we can be, we can sort of stand on the shoulders of GA4GH, and benefit from the work done already. And then I think the types of elements that Lea’s detailed in terms of standards in terms of performing the assay, or standards in terms of quantifying scores into likelihood ratios and so forth, might be a little bit more specific to this endeavour, and not so much related to prior work that GA4GH has advanced.

Angela Page 19:14  

I’m really intrigued by this, this idea of having standards so that we can deliver, you know, equal care across the world. So I’m wondering if you could speak to that a little bit; on the AVE website that talks about disseminating and democratizing access to the technology, but I think also to accurate variant interpretation. And so can you just talk a little bit about how AVE maybe is helping to, you know, to even decrease health disparities in the long run? Like, where are we going with this?

Clare Turnbull 19:44  

And so yes, so in terms of access to genomic testing, as of course, we all know, next-generation sequencing has dramatically reduced the cost of the laboratory element of sequencing and providing clinical genomic testing for patients, but there remains the expert manpower, and the cost thereof, in terms of interpreting the data. So again, if we can work as a global community to provide and disseminate high-quality data, and clear quantitative guidance in how that is used, that will enable a broader international group of laboratories to provide affordable genomic testing to their populations.

Lea Starita 20:34  

I think that’s great, and then, you know, democratising in, all the way back, you know, coming all the way back to, you know, even starting these sorts of experiments. So, you know, being able to help people get started, even in the field, so that, you know, they kind of understand, you know, here’s a set of protocols that they might be able to look at, to apply to their own work, here’s a set of standards for, you know, how you would want to, you know, set up the experiments, so you get trustworthy results, and here’s how you’re gonna want to share the data. So, helping people get started so that, you know, we can build more of these atlases so that we can understand more of the genome and, you know, again, trying to spread the work out across the world.

Clare Turnbull 21:21  

In a sense, I think it also democratises interpretation of genes associated with very rare phenotypes, because those are a big problem in terms of having access to sufficient clinical data to inform interpretation. But again, as Lea said, if we can open up access to and understanding of the technology to as many laboratories as possible, then you broaden out groups that will undertake MAVES relating to these genes associated with rarer conditions, which again, you know, will help us get to the ultimate endpoint of genomics, which is helping to identify the genetic changes underlying disease in patients. And just because a patient has a really, really, really rare disease, doesn’t mean they shouldn’t be entitled to genomic understanding of it.

Angela Page 22:21  

I think that is a fantastic point to end on. So, thank you both so much. It’s really been great learning about this organisation from you both and about the technology. So thank you for being here today.

Lea Starita 23:31  

Thank you for your interest. 

Clare Turnbull 21:32  

Yes, thank you very much.

Angela Page 22:35  

Thank you for listening to the omics exchange, a podcast of the Global Alliance for Genomics and Health. The OmicsXchange is produced by Connor Graham, Stephanie Li, and Julia Ostmann, edited by Biljana Gajic, with music created by Rishi Nag. GA4GH is the international standards organisation for genomics aimed at accelerating human health through data sharing. I’m Angela Page, and this is the OmicsXchange.