Learn how GA4GH helps expand responsible genomic data use to benefit human health.
Learn how GA4GH helps expand responsible genomic data use to benefit human health.
Our Strategic Road Map defines strategies, standards, and policy frameworks to support responsible global use of genomic and related health data.
Discover how a meeting of 50 leaders in genomics and medicine led to an alliance uniting more than 5,000 individuals and organisations to benefit human health.
GA4GH Inc. is a not-for-profit organisation that supports the global GA4GH community.
To guide our collaborative, globe-spanning alliance, GA4GH relies on a Standards Steering Committee and an Executive Committee.
The Funders Forum brings together organisations that offer both financial support and strategic guidance.
The EDI Advisory Group responds to issues raised in the GA4GH community, finding equitable, inclusive ways to build products that benefit diverse groups.
Distributed across four Host Institutions, our staff team supports the mission and operations of GA4GH.
Curious who we are? Meet the people and organisations across six continents who make up GA4GH.
More than 500 organisations connected to genomics — in healthcare, research, patient advocacy, industry, and beyond — have signed onto the mission and vision of GA4GH as Organisational Members.
These core Organisational Members are genomic data initiatives that have committed resources to guide GA4GH work and pilot our products.
This subset of Organisational Members whose networks or infrastructure align with GA4GH priorities has made a long-term commitment to engaging with our community.
Local and national organisations assign experts to spend at least 30% of their time building GA4GH products.
Anyone working in genomics and related fields is invited to participate in our inclusive community by creating and using new products.
Wondering what GA4GH does? Learn how we find and overcome challenges to expanding responsible genomic data use for the benefit of human health.
Study Groups define needs. Participants survey the landscape of the genomics and health community and determine whether GA4GH can help.
Work Streams create products. Community members join together to develop technical standards, policy frameworks, and policy tools that overcome hurdles to international genomic data use.
GIF solves problems. Organisations in the forum pilot GA4GH products in real-world situations. Along the way, they troubleshoot products, suggest updates, and flag additional needs.
NIF finds challenges and opportunities in genomics at a global scale. National programmes meet to share best practices, avoid incompatabilities, and help translate genomics into benefits for human health.
Communities of Interest find challenges and opportunities in areas such as rare disease, cancer, and infectious disease. Participants pinpoint real-world problems that would benefit from broad data use.
See all our products — always free and open-source. Do you work on cloud genomics, data discovery, user access, data security or regulatory policy and ethics? Need to represent genomic, phenotypic, or clinical data? We’ve got a solution for you.
All GA4GH standards, frameworks, and tools follow the Product Development and Approval Process before being officially adopted.
Learn how other organisations have implemented GA4GH products to solve real-world problems.
Help us transform the future of genomic data use! See how GA4GH can benefit you — whether you’re using our products, writing our standards, subscribing to a newsletter, or more.
Help create new global standards and frameworks for responsible genomic data use.
Align your organisation with the GA4GH mission and vision.
Solve your real-world data problems with support from this valuable network of global institutions.
Work with like-minded groups committed to better data use in areas like rare disease, cancer, and infectious disease.
Share your thoughts on all GA4GH products currently open for public comment.
Solve real problems by aligning your organisation with the world’s genomics standards. We offer software dvelopers both customisable and out-of-the-box solutions to help you get started.
Learn more about upcoming GA4GH events. See reports and recordings from our past events.
Speak directly to the global genomics and health community while supporting GA4GH strategy.
Be the first to hear about the latest GA4GH products, upcoming meetings, new initiatives, and more.
Questions? We would love to hear from you.
Read news, stories, and insights from the forefront of genomic and clinical data use.
Attend an upcoming GA4GH event, or view meeting reports from past events.
See new projects, updates, and calls for support from the Work Streams.
Read academic papers coauthored by GA4GH contributors.
Listen to our podcast OmicsXchange, featuring discussions from leaders in the world of genomics, health, and data sharing.
Check out our videos, then subscribe to our YouTube channel for more content.
View the latest GA4GH updates, Genomics and Health News, Implementation Notes, GDPR Briefs, and more.
Discover all things GA4GH: explore our news, events, videos, podcasts, announcements, publications, and newsletters.
10 Dec 2020
In episode 12, we spoke with Nicola Mulder from the University of Capetown on a large-scale, collaborative effort spearheaded by H3Africa to sequence genomes from regions and countries in Africa that have historically been missed or overlooked.
Angela Page: Welcome to the OmicsXchange, I’m Angela Page. In today’s episode, I’m speaking with Professor Nicola Mulder from the University of Capetown on a large-scale, collaborative effort spearheaded by the Human, Health, and Heredity in Africa Consortium, or H3Africa, to sequence genomes from regions and countries in Africa that have historically been missed or overlooked. Their key findings were recently published in the journal Nature. Nicky has been a part of the GA4GH community since its inception, and is now a Driver Project Champion of H3Africa. Welcome Nicky.
Nicky Mulder: It’s a pleasure to be here. Thanks again for the invitation to speak.
Angela Page: To start us off, can you give us a brief overview of H3Africa and its mission?
Nicky Mulder: H3Africa is the Human, Heredity, and Health in Africa Consortium. It’s a Consortium of a bunch of funded projects that are looking at the environmental and genetic basis for diseases of relevance to Africa. Each project is focusing on different diseases, and generating different genotype data—some of them are generating microbiome data. And we found early on in the Consortium that many wanted to use a genotyping chip for genotyping their samples. And the current genotyping chips were just not appropriate because they were designed based on European populations. So we decided to design a new array. And in doing so, we then identified samples that were not represented in current whole genome sequence data, because we needed a more complete reference panel for informing that design. So about 360 samples were sequenced as part of that project, and the genome analysis working group within the Consortium, which is responsible for promoting cross-Consortium projects, worked with data providers and said, “Can we use this data to do a full in depth analysis of African genomes?” And so we had this reference panel, and we then added additional data and used that for this particular study to further exploit that sequence data that kind of was a Consortium asset.
Angela Page: So in the Nature paper, H3Africa sequenced 426 individuals from 13 African countries, whose ancestries represented 50 different ethnolinguistic groups. What were some of the key findings?
Nicky Mulder: The paper has provided a lot of insights into both medical genetics and migration of African populations. So I think the key finding is looking at the individual populations that haven’t been sampled before and through those, we’ve found more than 3 million previously undescribed variants. So this means that they’re not present in any of the current databases or known datasets that people have access to. There’s also several regions that have been found to be under strong selection. Of these, about 62 of them are previously unreported. And if we look at the genes that are in these, they’re mostly involved in viral immunity, DNA repair, and metabolism. We also looked at the migration patterns and admixture and splits of populations around the continent, and found that there might be a new route that people hadn’t really thought of or hadn’t discovered before of migration of the Bantu speaking populations across the continent that looks like they may have gone through Zambia, which was not previously known. We’ve also looked at the pathogenic variants, medically relevant variants, and found several variants that have been previously characterized as pathogenic or non pathogenic, which are present at reasonable frequencies in these populations. So each one of the individuals had at least one variant that was classified as pathogenic in ClinVar, with a median of about seven per person. So it has basically given new insights into the characterization of some pathogenic and medically relevant variants, but also provided information about the selection pressure in the African genomes and how the migrations might have worked in the past.
Angela Page: What factors contributed to this study being the first of its kind?
Nicky Mulder: The previous studies have mostly looked at low coverage sequence data, or even prior to that, microsatellite data. But there were still gaps in terms of countries and ethnicities not represented in those studies. And what this study does is it looks at high coverage sequence, which means you can now look at rare variants, and you can look in greater depth at the data. And it filled in gaps that were previously identified. So when sampling the populations, we specifically looked at where there was no public data already available for those countries or regions. The other thing that’s novel is that this was done by African scientists on African data and normally it’s studies on African populations, but the publications are from non-African scientists—so the African scientists were able to provide context and local information that was relevant. It was a really good demonstration of African-led science and collaborative science really involving the sample collectors and data providers. I mean the overall approach was a cross-Consortium collaboration. But we also tried to build in capacity development in that sort of approach so that if there was a team that was well expertised in a particular area, then we had more junior scientists in there that could learn along in the process.
Angela Page: Earlier, you mentioned there are gaps in representation. How did you go about selecting the populations to fill in those gaps that you had seen from previous studies?
Nicky Mulder: We did a literature search and a database search for what datasets existed for African genomes. And so we mapped these out physically on an African map, and then looked for where there were gaps. But then we also had to think about where we had people on the ground. So we then looked at people within the Consortium, and sent out a general call to say, this is where we’re looking for gaps and who has samples that you’d like to have sequenced. And then some slightly outside the Consortium, like from our IEC member, one of them had a study in Africa and they contributed samples. So it was really looking at where geographically there are gaps, and then where we had people on the ground to collect samples from those. And they had to have the right consent in place to be able to ship those samples and have the sequencing done.
Angela Page: What work has already started or still needs to be done to get an even more complete picture?
Nicky Mulder: The working group has now started on looking at where the next set of gaps are. So we are doing longread sequencing in just a few—because it’s so expensive. So the paper basically highlighted about five primary ethnolinguistic, sort of ancestral groups. And so we’re trying to make sure we have representation of each of those for long read sequencing to build a reference graph and this pan genome. And we’ve also discussed with Illumina by potentially filling some of these gaps and sequencing with short-read sequencing. So these gaps include North Africa, because North Africa is very much underrepresented, and there was a reason for that, because the North African populations are quite admixed with Europeans. But they should be included, to give the broad picture. And for example, we are also looking at the islands, so Mauritius, Madagascar, and Réunion, for example, to include the African Island populations. There are also gaps in terms of the analysis. So you know, the amount of results you can put in a Nature Paper is extremely small, because of the size. There was a lot more work done on it in each of those teams that we can expand on. And so there are a number of sort of smaller groups that are now taking the data and diving deeper and so the capacity that was developed where we’re trying to build capacity in these now more specific areas of analysis of these genomes.
Angela Page: So why is this paper important for the broader human genomics field?
Nicky Mulder: So what the paper has shown is that there is so much more diversity, and so many novel SNPs that are still to be discovered, just by filling the gaps. I mean we looked at how adding new samples, whether that might reach some sort of saturation, and it never reached a plateau. So the more samples you add, the more novel SNPs you’re going to get. So the main important, I think, for the broader human genomic field here is that we haven’t even come close to reaching our knowledge, you know, some plateau of our knowledge of diversity particularly in African populations. And we will always have novel SNPs to discover.
Angela Page: What are the clinical implications of this study?
Nicky Mulder: Every individual had at least one ClinVar pathogenic variant, as I said, with a median of seven per person. And 262 of these unique variants that were either characterized in ClinVar as pathogenic or likely pathogenic, and of those 21% had a minor allele frequency of more than 0.05 in at least one population. So that means that these are reasonably common in African populations. But then of these, 5% had a frequency of less than 0.05 when you looked at the combined frequency of populations in gnomAD. So what this is showing is that they are locally common in Africa, but globally rare. So the pathogenicity score and their association of pathogenicity to these variants was based very much on frequency and other data from non-African populations. So what it means is that when you bring the African populations in, these are actually found to be common, and therefore can’t be pathogenic as previously described. So I think that’s the main clinical implication is that there are a number of possible misclassifications of variants as being pathogenic, when in fact, they are reasonably common in other populations.
Angela Page: And how will these findings impact future research and clinical care in Africa and beyond?
Nicky Mulder: The data itself has provided a huge new resource because when we’ve been trying to build reference panels—that’s why we sequenced us in the first place—there were those gaps. So I think it will impact research by having just a good reference and improved reference dataset for both research and for clinical data when you’re trying to filter variants. And then also for the reference panel, for example, if you’re doing genotyping arrays, it provides a good reference panel for imputation. I think it’s also to just demonstrate that there’s still so much to be explored. And within this particular dataset itself, there’s still a lot to be explored. So there’ll be a lot more research, both in terms of genetic research and clinically relevant research. So as soon as we build cohorts of disease patients, we can then compare them within the same populations. And I think also just demonstrated the need for further inclusion of diversity in studies and in different classification algorithms.
Angela Page: Thank you so much for speaking with me today, Nicky. It was really a great conversation.
Thank you for listening to the OmicsXchange—a podcast of the Global Alliance for Genomics and Health. The OmicsXchange podcast is produced by Stephanie Li and Caity Forgey, with music created by Rishi Nag. GA4GH is the international standards org for genomics, aimed at accelerating human health through data sharing. I’m Angela Page and this is the OmicsXchange.