Accelerated Profile HMM Searches
Top Cited Papers
Open Access
- 20 October 2011
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Computational Biology
- Vol. 7 (10), e1002195
- https://doi.org/10.1371/journal.pcbi.1002195
Abstract
Profile hidden Markov models (profile HMMs) and probabilistic inference methods have made important contributions to the theory of sequence database homology search. However, practical use of profile HMM methods has been hindered by the computational expense of existing software implementations. Here I describe an acceleration heuristic for profile HMMs, the “multiple segment Viterbi” (MSV) algorithm. The MSV algorithm computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment. MSV scores follow the same statistical distribution as gapped optimal local alignment scores, allowing rapid evaluation of significance of an MSV score and thus facilitating its use as a heuristic filter. I also describe a 20-fold acceleration of the standard profile HMM Forward/Backward algorithms using a method I call “sparse rescaling”. These methods are assembled in a pipeline in which high-scoring MSV hits are passed on for reanalysis with the full HMM Forward/Backward algorithm. This accelerated pipeline is implemented in the freely available HMMER3 software package. Performance benchmarks show that the use of the heuristic MSV filter sacrifices negligible sensitivity compared to unaccelerated profile HMM searches. HMMER3 is substantially more sensitive and 100- to 1000-fold faster than HMMER2. HMMER3 is now about as fast as BLAST for protein searches. Searching sequence databases is one of the most important applications in computational molecular biology. The main workhorse in the field is the BLAST suite of programs. Since the introduction of BLAST in the 1990's, important theoretical advances in homology search methodology have been made using probabilistic inference methods and hidden Markov models (HMMs). However, previous software implementations of these newer probabilistic methods were slower than BLAST by about 100-fold. This hindered their utility, because computation speed is so critical with the rapidly increasing size of modern sequence databases. Here I describe the acceleration methods I implemented in a new, freely available profile HMM software package, HMMER3. HMMER3 makes profile HMM searches about as fast as BLAST, while retaining the power of using probabilistic inference technology.Keywords
This publication has 42 references indexed in Scilit:
- Ongoing and future developments at the Universal Protein ResourceNucleic Acids Research, 2010
- Hidden Markov model speed heuristic and iterative HMM search procedureBMC Bioinformatics, 2010
- BLAST+: architecture and applicationsBMC Bioinformatics, 2009
- The Pfam protein families databaseNucleic Acids Research, 2009
- InterPro: the integrative protein signature databaseNucleic Acids Research, 2008
- Exploring genomic dark matter: A critical assessment of the performance of homology search methods on noncoding RNAGenome Research, 2006
- Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searchesNucleic Acids Research, 2006
- Protein database searches using compositionally adjusted substitution matricesThe FEBS Journal, 2005
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- A tutorial on hidden Markov models and selected applications in speech recognitionProceedings of the IEEE, 1989