Korean Genome Project: 1094 Korean personal genomes with clinical information

Open Access

29 May 2020

journal article
research article
Published by American Association for the Advancement of Science (AAAS) in Science Advances

Vol. 6 (22), eaaz7835
https://doi.org/10.1126/sciadv.aaz7835

Abstract

We present the initial phase of the Korean Genome Project (Korea1K), including 1094 whole genomes (sequenced at an average depth of 31×), along with data of 79 quantitative clinical traits. We identified 39 million single-nucleotide variants and indels of which half were singleton or doubleton and detected Korean-specific patterns based on several types of genomic variations. A genome-wide association study illustrated the power of whole-genome sequences for analyzing clinical traits, identifying nine more significant candidate alleles than previously reported from the same linkage disequilibrium blocks. Also, Korea1K, as a reference, showed better imputation accuracy for Koreans than the 1KGP panel. As proof of utility, germline variants in cancer samples could be filtered out more effectively when the Korea1K variome was used as a panel of normals compared to non-Korean variome sets. Overall, this study shows that Korea1K can be a useful genotypic and phenotypic resource for clinical and ethnogenetic studies.

Keywords

Funding Information

Ulsan National Institute of Science of Technology (1.190007.01)
Ulsan National Institute of Science of Technology (1.190033.01)
Ulsan National Institute of Science of Technology (2.180016.01)
Clinomics, Inc (Internal funding)
Clinomics, Inc (Internal funding)
Clinomics, Inc (Internal funding)
Clinomics Inc. (Internal funding)
Clinomics, Inc (Internal funding)
Clinomics, Inc (Internal funding)
Clinomics, Inc (Internal funding)
National Center for Standard Reference Data (20003641)
Clinomics Inc. (Internal funding)

This publication has 64 references indexed in Scilit:

SIFT web server: predicting effects of amino acid substitutions on proteins
Nucleic Acids Research, 2012
Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data
Bioinformatics, 2011
CNVnator: An approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing
Genome Research, 2011
TIARA: a database for accurate analysis of multiple personal genomes based on cross-technology
Nucleic Acids Research, 2010
A map of human genome variation from population-scale sequencing
Nature, 2010
Genome-wide association of serum bilirubin levels in Korean population
Human Molecular Genetics, 2010
The Sequence Alignment/Map format and SAMtools
Bioinformatics, 2009
Fast and accurate short read alignment with Burrows–Wheeler transform
Bioinformatics, 2009
PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses
American Journal of Human Genetics, 2007
Principal components analysis corrects for stratification in genome-wide association studies
Nature Genetics, 2006

Cited by 69 articles