2-kupl: mapping-free variant detection from DNA-seq data of matched samples
Open Access
- 5 June 2021
- journal article
- research article
- Published by Springer Science and Business Media LLC in BMC Bioinformatics
- Vol. 22 (1), 1-22
- https://doi.org/10.1186/s12859-021-04185-6
Abstract
Background: The detection of genome variants, including point mutations, indels and structural variants, is a fundamental and challenging computational problem. We address here the problem of variant detection between two deep-sequencing (DNA-seq) samples, such as two human samples from an individual patient, or two samples from distinct bacterial strains. The preferred strategy in such a case is to align each sample to a common reference genome, collect all variants and compare these variants between samples. Such mapping-based protocols have several limitations. DNA sequences with large indels, aggregated mutations and structural variants are hard to map to the reference. Furthermore, DNA sequences cannot be mapped reliably to genomic low complexity regions and repeats. Results: We introduce 2-kupl, a k-mer based, mapping-free protocol to detect variants between two DNA-seq samples. On simulated and actual data, 2-kupl achieves higher accuracy than other mapping-free protocols. Applying 2-kupl to prostate cancer whole exome sequencing data, we identify a number of candidate variants in hard-to-map regions and propose potential novel recurrent variants in this disease. Conclusions: We developed a mapping-free protocol for variant calling between matched DNA-seq samples. Our protocol is suitable for variant detection in unmappable genome regions or in the absence of a reference genome.Keywords
Funding Information
- Agence Nationale de la Recherche, France (ANR-18-CE45-0020)
- Annoroad Technology, Beijing
This publication has 72 references indexed in Scilit:
- Exome sequencing identifies recurrent SPOP, FOXA1 and MED12 mutations in prostate cancerNature Genetics, 2012
- De novo assembly and genotyping of variants using colored de Bruijn graphsNature Genetics, 2012
- SomaticSniper: identification of somatic point mutations in whole genome sequencing dataBioinformatics, 2011
- How to apply de Bruijn graphs to genome assemblyNature Biotechnology, 2011
- Paired de Bruijn Graphs: A Novel Approach for Incorporating Mate Pair Information into Genome AssemblersJournal of Computational Biology, 2011
- A fast, lock-free approach for efficient parallel counting of occurrences of k-mersBioinformatics, 2011
- Integrative genomics viewerNature Biotechnology, 2011
- The functional impact of structural variation in humansTrends in Genetics, 2008
- Patterns of somatic mutation in human cancer genomesNature, 2007
- The COSMIC (Catalogue of Somatic Mutations in Cancer) database and websiteBritish Journal of Cancer, 2004