HH-suite3 for fast remote homology detection and deep protein annotation
Top Cited Papers
Open Access
- 14 September 2019
- journal article
- research article
- Published by Springer Science and Business Media LLC in BMC Bioinformatics
- Vol. 20 (1), 1-15
- https://doi.org/10.1186/s12859-019-3019-7
Abstract
HH-suite is a widely used open source software suite for sensitive sequence similarity searches and protein fold recognition. It is based on pairwise alignment of profile Hidden Markov models (HMMs), which represent multiple sequence alignments of homologous proteins. We developed a single-instruction multiple-data (SIMD) vectorized implementation of the Viterbi algorithm for profile HMM alignment and introduced various other speed-ups. These accelerated the search methods HHsearch by a factor 4 and HHblits by a factor 2 over the previous version 2.0.16. HHblits3 is ∼10× faster than PSI-BLAST and ∼20× faster than HMMER3. Jobs to perform HHsearch and HHblits searches with many query profile HMMs can be parallelized over cores and over cluster servers using OpenMP and message passing interface (MPI). The free, open-source, GPLv3-licensed software is available at https://github.com/soedinglab/hh-suite. The added functionalities and increased speed of HHsearch and HHblits should facilitate their use in large-scale protein structure and function prediction, e.g. in metagenomics and genomics projects.Keywords
Funding Information
- Horizon 2020 (685778)
This publication has 36 references indexed in Scilit:
- Faster Smith-Waterman database searches with inter-sequence SIMD parallelisationBMC Bioinformatics, 2011
- Adaptive seeds tame genomic sequence comparisonGenome Research, 2011
- Search and clustering orders of magnitude faster than BLASTBioinformatics, 2010
- Detection of distant evolutionary relationships between protein families using theory of sequence profile-profile comparisonBMC Bioinformatics, 2010
- CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignmentBMC Bioinformatics, 2008
- Data growth and its impact on the SCOP database: new developmentsNucleic Acids Research, 2007
- 160-fold acceleration of the Smith-Waterman algorithm using a field programmable gate array (FPGA)BMC Bioinformatics, 2007
- The Protein Data BankNucleic Acids Research, 2000
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical featuresPeptide Science, 1983