Conpair: concordance and contamination estimator for matched tumor–normal pairs
Open Access
- 26 June 2016
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 32 (20), 3196-3198
- https://doi.org/10.1093/bioinformatics/btw389
Abstract
Motivation: Sequencing of matched tumor and normal samples is the standard study design for reliable detection of somatic alterations. However, even very low levels of cross-sample contamination significantly impact calling of somatic mutations, because contaminant germline variants can be incorrectly interpreted as somatic. There are currently no sequence-only based methods that reliably estimate contamination levels in tumor samples, which frequently display copy number changes. As a solution, we developed Conpair, a tool for detection of sample swaps and cross-individual contamination in whole-genome and whole-exome tumor–normal sequencing experiments. Results: On a ladder of in silico contaminated samples, we demonstrated that Conpair reliably measures contamination levels as low as 0.1%, even in presence of copy number changes. We also estimated contamination levels in glioblastoma WGS and WXS tumor–normal datasets from TCGA and showed that they strongly correlate with tumor–normal concordance, as well as with the number of germline variants called as somatic by several widely-used somatic callers. Availability and Implementation: The method is available at: https://github.com/nygenome/conpair. Contact:egrabowska@gmail.com or mczody@nygenome.org Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 7 references indexed in Scilit:
- The Somatic Genomic Landscape of GlioblastomaCell, 2013
- Sensitive detection of somatic point mutations in impure and heterogeneous cancer samplesNature Biotechnology, 2013
- EXCAVATOR: detecting copy number variants from whole-exome sequencing dataGenome Biology, 2013
- Detecting and Estimating Contamination of Human DNA Samples in Sequencing and Array-Based Genotype DataAmerican Journal of Human Genetics, 2012
- LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasetsNucleic Acids Research, 2012
- Strelka: accurate somatic small-variant calling from sequenced tumor–normal sample pairsBioinformatics, 2012
- ContEst: estimating cross-contamination of human samples in next-generation sequencing dataBioinformatics, 2011