Efficient phasing and imputation of low-coverage sequencing data using large reference panels
Top Cited Papers
- 7 January 2021
- journal article
- research article
- Published by Springer Science and Business Media LLC in Nature Genetics
- Vol. 53 (1), 120-126
- https://doi.org/10.1038/s41588-020-00756-0
Abstract
Low-coverage whole-genome sequencing followed by imputation has been proposed as a cost-effective genotyping approach for disease and population genetics studies. However, its competitiveness against SNP arrays is undermined because current imputation methods are computationally expensive and unable to leverage large reference panels. Here, we describe a method, GLIMPSE, for phasing and imputation of low-coverage sequencing datasets from modern reference panels. We demonstrate its remarkable performance across different coverages and human populations. GLIMPSE achieves imputation of a genome for less than US$1 in computational cost, considerably outperforming other methods and improving imputation accuracy over the full allele frequency range. As a proof of concept, we show that 1× coverage enables effective gene expression association studies and outperforms dense SNP arrays in rare variant burden tests. Overall, this study illustrates the promising potential of low-coverage imputation and suggests a paradigm shift in the design of future genomic studies.Keywords
Funding Information
- Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (PP00P3_176977)
This publication has 36 references indexed in Scilit:
- Population genomics based on low coverage sequencing: how low should we go?Molecular Ecology, 2012
- GENCODE: The reference human genome annotation for The ENCODE ProjectGenome Research, 2012
- Extremely low-coverage sequencing and imputation increases power for genome-wide association studiesNature Genetics, 2012
- Pybedtools: a flexible Python library for manipulating genomic datasets and annotationsBioinformatics, 2011
- A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing dataBioinformatics, 2011
- SNP detection and genotyping from low-coverage sequencing data on multiple diploid samplesGenome Research, 2010
- The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing dataGenome Research, 2010
- Simultaneous Genotype Calling and Haplotype Phasing Improves Genotype Accuracy and Reduces False-Positive Associations for Genome-wide Association StudiesAmerican Journal of Human Genetics, 2009
- A new multipoint method for genome-wide association studies by imputation of genotypesNature Genetics, 2007
- Linkage Disequilibrium in Humans: Models and DataAmerican Journal of Human Genetics, 2001