CONTRA: copy number analysis for targeted resequencing
Open Access
- 2 April 2012
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 28 (10), 1307-1313
- https://doi.org/10.1093/bioinformatics/bts146
Abstract
Motivation: In light of the increasing adoption of targeted resequencing (TR) as a cost-effective strategy to identify disease-causing variants, a robust method for copy number variation (CNV) analysis is needed to maximize the value of this promising technology. Results: We present a method for CNV detection for TR data, including whole-exome capture data. Our method calls copy number gains and losses for each target region based on normalized depth of coverage. Our key strategies include the use of base-level log-ratios to remove GC-content bias, correction for an imbalanced library size effect on log-ratios, and the estimation of log-ratio variations via binning and interpolation. Our methods are made available via CONTRA (COpy Number Targeted Resequencing Analysis), a software package that takes standard alignment formats (BAM/SAM) and outputs in variant call format (VCF4.0), for easy integration with other next-generation sequencing analysis packages. We assessed our methods using samples from seven different target enrichment assays, and evaluated our results using simulated data and real germline data with known CNV genotypes. Availability and implementation: Source code and sample data are freely available under GNU license (GPLv3) at http://contra-cnv.sourceforge.net/ Contact:Jason.Li@petermac.org Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 19 references indexed in Scilit:
- CNVnator: An approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencingGenome Research, 2011
- Analyzing and minimizing PCR amplification bias in Illumina sequencing librariesGenome Biology, 2011
- Control-free calling of copy number alterations in deep-sequencing data using GC-content normalizationBioinformatics, 2010
- CNAseg—a novel framework for identification of copy number changes in cancer from second-generation sequencing dataBioinformatics, 2010
- The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing dataGenome Research, 2010
- Massively Parallel Sequencing of Exons on the X Chromosome Identifies RBM10 as the Gene that Causes a Syndromic Form of Cleft PalateAmerican Journal of Human Genetics, 2010
- Fast and accurate short read alignment with Burrows–Wheeler transformBioinformatics, 2009
- High-resolution mapping of copy-number alterations with massively parallel sequencingNature Methods, 2008
- Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencingNature Genetics, 2008
- Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple TestingJournal of the Royal Statistical Society: Series B (Methodological), 1995