GEM Japan releases largest-ever open-access Japanese variant frequency panel


GEnome Medical alliance Japan (GEM Japan), a Driver Project of the Global Alliance for Genomics and Health (GA4GH), has released  GEM Japan Whole Genome Aggregation (GEM-J WGA) — a first-of-its-kind open-access variant frequency panel of 7,609 Japanese whole genome sequences. Researchers at GEM Japan detected  >76M autosomal single nucleotide variations(SNV) and >10M autosomal insertion and deletion (INDEL) sequences in the dataset. They detected another >2M SNVs and >400K INDELs on the X chromosome. 

The Japan Agency for Medical Research and Development (AMED) established GEM Japan in 2018 to facilitate genomic and health data sharing among research and medical laboratory communities, both domestically and internationally. GEM Japan aggregates genotypic and associated phenotypic information from various Japanese disease-specific and population genetic studies into publicly available resources based on a standardized format.

“GEM-J WGA has been much encouraged by GA4GH to share Japanese allele frequency data, including rare variants of 0.01%, as a global reference to the Asian ethnic specificity,” said Hidewaki Nakagawa, M.D., Ph.D., Program Officer at AMED and the GEM Japan Driver Project Champion at GA4GH. “Our participation in GA4GH has motivated us to pursue this work through its focus on the value of ethnically diverse genomic data.” 

“In the GA4GH community, GEM Japan has become well recognized as the global leader in sharing Asian-specific data to facilitate the interpretation of human diseases and improve medical practice — not only in Japan but also around the world,” said Ewan Birney, Ph.D., Director of EMBL’s European Bioinformatics Institute and Chair of the GA4GH Steering Committee. “This selfless gift from Japan to the world will ensure that people of Japanese descent around the globe will benefit from genomic research.”

GEM Japan has established a nation-wide alliance of universities, institutes, and hospitals committed to the promotion of genome-based medicine. Patient information and genomic data are securely collected and used in research to investigate pathogenic variations, risk factors, and biomarkers. The GEM-J WGA is the culmination of a collaborative collection and analysis effort among this alliance, including researchers at Tohoku Medical Megabank Organization (ToMMo) at Tohoku University, Iwate Tohoku Medical Megabank Organization (IMM) at Iwate Medical University , RIKEN, and The Institute of Medical Science at The University of Tokyo (BBJ) .

The data were released through TogoVar, a database developed by the National Bioscience Database Center (NBDC) at the Japan Science and Technology Agency (JST) and Database Center for Life Science (DBCLS), Joint Support-Center for Data Science Research at Research Organization of Information and Systems. Based on the Genome Aggregation Database (gnomAD) project, TogoVar is an integrated database of Japanese genomic variation. The resulting Japanese gnomAD data will be clinically valuable for “filtering out” rare but presumably non-pathogenic variants. 

Individual genome sequences mapped to reference genome sequences will also be made available under controlled-access / group-sharing through the Japanese Genotype-phenotype Archive (JGA) and AMED Genome group sharing Database (AGD), operated in collaboration with the Bioinformation and DNA Data Bank of Japan (DDBJ) Center at the National Institute of Genetics and NBDC at JST.

“Summarized information on pathogenic or benign variants will also be made available in a standardized format that will be compatible with existing GA4GH activities, such as ClinVar,” said Nakagawa. 

In particular, the information on pathogenic variations, which have been collected and curated on an upstream application such as the Patient Archive (PA), is integrated into the Medical Genomics Japan Variant Database (MGeND). MGeND consists of three types of formatted data: 

  1. ClinVar-based representation of pathogenic annotation to monogenic disorders, 
  2. summary data from GWAS analyses used in the meta-analyses, and 
  3. HLA-disease associations and healthy control data with up to 8-digits (4-fields) precision. 

GEM Japan is developing a standardized format for the three types of data and annotation. Our standards will be harmonized with existing GA4GH standards in our commitment to the Technical Work Stream.

Moreover, GEM Japan also provides use cases regarding “Localization,” with adding multi-lingual functions for Japanese users, such as phenotypic data collection captured with localized ontologies, and a good practice of data sharing from the viewpoint of a country where English is not the primary language.