cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate

Open Access

2 January 2012

journal article
research article
Published by Oxford University Press (OUP) in Nucleic Acids Research

Vol. 40 (9), e69
https://doi.org/10.1093/nar/gks003

Abstract

Quantitative analyses of next-generation sequencing (NGS) data, such as the detection of copy number variations (CNVs), remain challenging. Current methods detect CNVs as changes in the depth of coverage along chromosomes. Technological or genomic variations in the depth of coverage thus lead to a high false discovery rate (FDR), even upon correction for GC content. In the context of association studies between CNVs and disease, a high FDR means many false CNVs, thereby decreasing the discovery power of the study after correction for multiple testing. We propose ‘Copy Number estimation by a Mixture Of PoissonS’ (cn.MOPS), a data processing pipeline for CNV detection in NGS data. In contrast to previous approaches, cn.MOPS incorporates modeling of depths of coverage across samples at each genomic position. Therefore, cn.MOPS is not affected by read count variations along chromosomes. Using a Bayesian approach, cn.MOPS decomposes variations in the depth of coverage across samples into integer copy numbers and noise by means of its mixture components and Poisson distributions, respectively. The noise estimate allows for reducing the FDR by filtering out detections having high noise that are likely to be false detections. We compared cn.MOPS with the five most popular methods for CNV detection in NGS data using four benchmark datasets: (i) simulated data, (ii) NGS data from a male HapMap individual with implanted CNVs from the X chromosome, (iii) data from HapMap individuals with known CNVs, (iv) high coverage data from the 1000 Genomes Project. cn.MOPS outperformed its five competitors in terms of precision (1–FDR) and recall for both gains and losses in all benchmark data sets. The software cn.MOPS is publicly available as an R package at http://www.bioinf.jku.at/software/cnmops/ and at Bioconductor.

Keywords

This publication has 31 references indexed in Scilit:

Exome sequencing-based copy-number variation and loss of heterozygosity detection: ExomeCNV
Bioinformatics, 2011
A framework for variation discovery and genotyping using next-generation DNA sequencing data
Nature Genetics, 2011
Detecting common copy number variants in high-throughput sequencing data by using JointSLM algorithm
Nucleic Acids Research, 2011
Initial impact of the sequencing of the human genome
Nature, 2011
Filtering data from high-throughput experiments based on measurement reliability
Proceedings of the National Academy of Sciences of the United States of America, 2010
SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples
Genome Research, 2010
A map of human genome variation from population-scale sequencing
Nature, 2010
CNAseg—a novel framework for identification of copy number changes in cancer from second-generation sequencing data
Bioinformatics, 2010
Integrating common and rare genetic variation in diverse human populations
Nature, 2010
Detecting copy number variation with mated short reads
Genome Research, 2010

Cited by 404 articles