Search and clustering orders of magnitude faster than BLAST

Top Cited Papers

12 August 2010

journal article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 26 (19), 2460-2461
https://doi.org/10.1093/bioinformatics/btq461

Abstract

Motivation: Biological sequence data is accumulating rapidly, motivating the development of improved high-throughput methods for sequence classification. Results: UBLAST and USEARCH are new algorithms enabling sensitive local and global search of large sequence databases at exceptionally high speeds. They are often orders of magnitude faster than BLAST in practical applications, though sensitivity to distant protein relationships is lower. UCLUST is a new clustering method that exploits USEARCH to assign sequences to clusters. UCLUST offers several advantages over the widely used program CD-HIT, including higher speed, lower memory use, improved sensitivity, clustering at lower identities and classification of much larger datasets. Availability: Binaries are available at no charge for non-commercial use at http://www.drive5.com/usearch Contact: robert@drive5.com Supplementary information: Supplementary data are available at Bioinformatics online.

This publication has 7 references indexed in Scilit:

Bacterial Community Variation in Human Body Habitats Across Space and Time
Science, 2009
Rfam: updates to the RNA families database
Nucleic Acids Research, 2008
Bioinformatics, 2006
MUSCLE: multiple sequence alignment with high accuracy and high throughput
Nucleic Acids Research, 2004
Local homology recognition and distance measures in linear time using compressed amino acid alphabets
Nucleic Acids Research, 2004
Challenges in bioinformatics: infrastructure, models and analytics
Trends in Biotechnology, 2001
Basic local alignment search tool
Journal of Molecular Biology, 1990

Cited by 17719 articles