cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate
Open Access
- 2 January 2012
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 40 (9), e69
- https://doi.org/10.1093/nar/gks003
Abstract
Quantitative analyses of next-generation sequencing (NGS) data, such as the detection of copy number variations (CNVs), remain challenging. Current methods detect CNVs as changes in the depth of coverage along chromosomes. Technological or genomic variations in the depth of coverage thus lead to a high false discovery rate (FDR), even upon correction for GC content. In the context of association studies between CNVs and disease, a high FDR means many false CNVs, thereby decreasing the discovery power of the study after correction for multiple testing. We propose ‘Copy Number estimation by a Mixture Of PoissonS’ (cn.MOPS), a data processing pipeline for CNV detection in NGS data. In contrast to previous approaches, cn.MOPS incorporates modeling of depths of coverage across samples at each genomic position. Therefore, cn.MOPS is not affected by read count variations along chromosomes. Using a Bayesian approach, cn.MOPS decomposes variations in the depth of coverage across samples into integer copy numbers and noise by means of its mixture components and Poisson distributions, respectively. The noise estimate allows for reducing the FDR by filtering out detections having high noise that are likely to be false detections. We compared cn.MOPS with the five most popular methods for CNV detection in NGS data using four benchmark datasets: (i) simulated data, (ii) NGS data from a male HapMap individual with implanted CNVs from the X chromosome, (iii) data from HapMap individuals with known CNVs, (iv) high coverage data from the 1000 Genomes Project. cn.MOPS outperformed its five competitors in terms of precision (1–FDR) and recall for both gains and losses in all benchmark data sets. The software cn.MOPS is publicly available as an R package at http://www.bioinf.jku.at/software/cnmops/ and at Bioconductor.Keywords
This publication has 31 references indexed in Scilit:
- Exome sequencing-based copy-number variation and loss of heterozygosity detection: ExomeCNVBioinformatics, 2011
- A framework for variation discovery and genotyping using next-generation DNA sequencing dataNature Genetics, 2011
- Detecting common copy number variants in high-throughput sequencing data by using JointSLM algorithmNucleic Acids Research, 2011
- Initial impact of the sequencing of the human genomeNature, 2011
- Filtering data from high-throughput experiments based on measurement reliabilityProceedings of the National Academy of Sciences of the United States of America, 2010
- SNP detection and genotyping from low-coverage sequencing data on multiple diploid samplesGenome Research, 2010
- A map of human genome variation from population-scale sequencingNature, 2010
- CNAseg—a novel framework for identification of copy number changes in cancer from second-generation sequencing dataBioinformatics, 2010
- Integrating common and rare genetic variation in diverse human populationsNature, 2010
- Detecting copy number variation with mated short readsGenome Research, 2010