Centrifuge: rapid and sensitive classification of metagenomic sequences
Open Access
- 24 May 2016
- preprint content
- Published by Cold Spring Harbor Laboratory
- p. 054965
- https://doi.org/10.1101/054965
Abstract
Centrifuge is a novel microbial classification engine that enables rapid, accurate and sensitive labeling of reads and quantification of species on desktop computers. The system uses an indexing scheme based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini (FM) index, optimized specifically for the metagenomic classification problem. Centrifuge requires a relatively small index (4.2 GB for 4,078 bacterial and 200 archaeal genomes) and classifies sequences at very high speed, allowing it to process the millions of reads from a typical high-throughput DNA sequencing run within a few minutes. Together these advances enable timely and accurate analysis of large metagenomics data sets on conventional desktop computers. Because of its space-optimized indexing schemes, Centrifuge also makes it possible to index the entire NCBI non-redundant nucleotide sequence database (a total of 109 billion bases) with an index size of 69 GB, in contrast to k-mer based indexing schemes, which require far more extensive space. Centrifuge is available as free, open-source software from www.ccb.jhu.edu/software/centrifugeKeywords
This publication has 25 references indexed in Scilit:
- A framework for human microbiome researchNature, 2012
- Fast gapped-read alignment with Bowtie 2Nature Methods, 2012
- PhymmBL expanded: confidence scores, custom databases, parallelization and moreNature Methods, 2011
- A fast, lock-free approach for efficient parallel counting of occurrences of k-mersBioinformatics, 2011
- Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiationNature Biotechnology, 2010
- Fast and accurate long-read alignment with Burrows–Wheeler transformBioinformatics, 2010
- Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov modelsNature Methods, 2009
- Fast and accurate short read alignment with Burrows–Wheeler transformBioinformatics, 2009
- Mapping and quantifying mammalian transcriptomes by RNA-SeqNature Methods, 2008
- A Greedy Algorithm for Aligning DNA SequencesJournal of Computational Biology, 2000