Strelka: accurate somatic small-variant calling from sequenced tumor–normal sample pairs
Top Cited Papers
Open Access
- 10 May 2012
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 28 (14), 1811-1817
- https://doi.org/10.1093/bioinformatics/bts271
Abstract
Motivation: Whole genome and exome sequencing of matched tumor–normal sample pairs is becoming routine in cancer research. The consequent increased demand for somatic variant analysis of paired samples requires methods specialized to model this problem so as to sensitively call variants at any practical level of tumor impurity. Results: We describe Strelka, a method for somatic SNV and small indel detection from sequencing data of matched tumor–normal samples. The method uses a novel Bayesian approach which represents continuous allele frequencies for both tumor and normal samples, while leveraging the expected genotype structure of the normal. This is achieved by representing the normal sample as a mixture of germline variation with noise, and representing the tumor sample as a mixture of the normal sample with somatic variation. A natural consequence of the model structure is that sensitivity can be maintained at high tumor impurity without requiring purity estimates. We demonstrate that the method has superior accuracy and sensitivity on impure samples compared with approaches based on either diploid genotype likelihoods or general allele-frequency tests. Availability: The Strelka workflow source code is available at ftp://strelka@ftp.illumina.com/. Contact:csaunders@illumina.com Supplementary information: Supplementary data are available at Bioinformatics onlineKeywords
This publication has 15 references indexed in Scilit:
- SomaticSniper: identification of somatic point mutations in whole genome sequencing dataBioinformatics, 2011
- Ensembl 2012Nucleic Acids Research, 2011
- The variant call format and VCFtoolsBioinformatics, 2011
- A framework for variation discovery and genotyping using next-generation DNA sequencing dataNature Genetics, 2011
- A map of human genome variation from population-scale sequencingNature, 2010
- SNVMix: predicting single nucleotide variants from next-generation sequencing of tumorsBioinformatics, 2010
- A comprehensive catalogue of somatic mutations from a human cancer genomeNature, 2009
- VarScan: variant detection in massively parallel sequencing of individual and pooled samplesBioinformatics, 2009
- The Sequence Alignment/Map format and SAMtoolsBioinformatics, 2009
- DNA sequencing of a cytogenetically normal acute myeloid leukaemia genomeNature, 2008