Classification of DNA sequences using Bloom filters
Open Access
- 13 May 2010
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 26 (13), 1595-1600
- https://doi.org/10.1093/bioinformatics/btq230
Abstract
Motivation: New generation sequencing technologies producing increasingly complex datasets demand new efficient and specialized sequence analysis algorithms. Often, it is only the ‘novel’ sequences in a complex dataset that are of interest and the superfluous sequences need to be removed. Results: A novel algorithm, fast and accurate classification of sequences (FACSs), is introduced that can accurately and rapidly classify sequences as belonging or not belonging to a reference sequence. FACS was first optimized and validated using a synthetic metagenome dataset. An experimental metagenome dataset was then used to show that FACS achieves comparable accuracy as BLAT and SSAHA2 but is at least 21 times faster in classifying sequences. Availability: Source code for FACS, Bloom filters and MetaSim dataset used is available at http://facs.biotech.kth.se. The Bloom::Faster 1.6 Perl module can be downloaded from CPAN at http://search.cpan.org/∼palvaro/Bloom-Faster-1.6/ Contacts:henrik.stranneheim@biotech.kth.se; joakiml@biotech.kth.se Supplementary information: Supplementary data are available at Bioinformatics online.This publication has 17 references indexed in Scilit:
- Fast and accurate short read alignment with Burrows–Wheeler transformBioinformatics, 2009
- Ultrafast and memory-efficient alignment of short DNA sequences to the human genomeGenome Biology, 2009
- DNA sequencing of a cytogenetically normal acute myeloid leukaemia genomeNature, 2008
- Mapping short DNA sequencing reads and calling variants using mapping quality scoresGenome Research, 2008
- A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysisNature Biotechnology, 2008
- Cloning of a human parvovirus by molecular screening of respiratory tract samplesProceedings of the National Academy of Sciences of the United States of America, 2005
- Network Applications of Bloom Filters: A SurveyInternet Mathematics, 2004
- BLAT—The BLAST-Like Alignment ToolGenome Research, 2002
- A virus discovery method incorporating DNase treatment and its application to the identification of two bovine parvovirus speciesProceedings of the National Academy of Sciences of the United States of America, 2001
- Space/time trade-offs in hash coding with allowable errorsCommunications of the ACM, 1970