A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data
Top Cited Papers
Open Access
- 8 September 2011
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 27 (21), 2987-2993
- https://doi.org/10.1093/bioinformatics/btr509
Abstract
Motivation: Most existing methods for DNA sequence analysis rely on accurate sequences or genotypes. However, in applications of the next-generation sequencing (NGS), accurate genotypes may not be easily obtained (e.g. multi-sample low-coverage sequencing or somatic mutation discovery). These applications press for the development of new methods for analyzing sequence data with uncertainty. Results: We present a statistical framework for calling SNPs, discovering somatic mutations, inferring population genetical parameters and performing association tests directly based on sequencing data without explicit genotyping or linkage-based imputation. On real data, we demonstrate that our method achieves comparable accuracy to alternative methods for estimating site allele count, for inferring allele frequency spectrum and for association mapping. We also highlight the necessity of using symmetric datasets for finding somatic mutations and confirm that for discovering rare events, mismapping is frequently the leading source of errors. Availability: http://samtools.sourceforge.net Contact: hengli@broadinstitute.orgKeywords
Other Versions
This publication has 37 references indexed in Scilit:
- Variation in genome-wide mutation rates within and between human familiesNature Genetics, 2011
- Genotype and SNP calling from next-generation sequencing dataNature Reviews Genetics, 2011
- A framework for variation discovery and genotyping using next-generation DNA sequencing dataNature Genetics, 2011
- MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypesGenetic Epidemiology, 2010
- A map of human genome variation from population-scale sequencingNature, 2010
- A comprehensive catalogue of somatic mutations from a human cancer genomeNature, 2009
- A small-cell lung cancer genome with complex signatures of tobacco exposureNature, 2009
- Simultaneous Genotype Calling and Haplotype Phasing Improves Genotype Accuracy and Reduces False-Positive Associations for Genome-wide Association StudiesAmerican Journal of Human Genetics, 2009
- Accurate whole human genome sequencing using reversible terminator chemistryNature, 2008
- DNA sequencing of a cytogenetically normal acute myeloid leukaemia genomeNature, 2008