HH-suite3 for fast remote homology detection and deep protein annotation

Top Cited Papers

Open Access

14 September 2019

journal article
research article
Published by Springer Science and Business Media LLC in BMC Bioinformatics

Vol. 20 (1), 1-15
https://doi.org/10.1186/s12859-019-3019-7

Abstract

HH-suite is a widely used open source software suite for sensitive sequence similarity searches and protein fold recognition. It is based on pairwise alignment of profile Hidden Markov models (HMMs), which represent multiple sequence alignments of homologous proteins. We developed a single-instruction multiple-data (SIMD) vectorized implementation of the Viterbi algorithm for profile HMM alignment and introduced various other speed-ups. These accelerated the search methods HHsearch by a factor 4 and HHblits by a factor 2 over the previous version 2.0.16. HHblits3 is ∼10× faster than PSI-BLAST and ∼20× faster than HMMER3. Jobs to perform HHsearch and HHblits searches with many query profile HMMs can be parallelized over cores and over cluster servers using OpenMP and message passing interface (MPI). The free, open-source, GPLv3-licensed software is available at https://github.com/soedinglab/hh-suite. The added functionalities and increased speed of HHsearch and HHblits should facilitate their use in large-scale protein structure and function prediction, e.g. in metagenomics and genomics projects.

Keywords

Funding Information

Horizon 2020 (685778)

This publication has 36 references indexed in Scilit:

Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation
BMC Bioinformatics, 2011
Adaptive seeds tame genomic sequence comparison
Genome Research, 2011
Search and clustering orders of magnitude faster than BLAST
Bioinformatics, 2010
Detection of distant evolutionary relationships between protein families using theory of sequence profile-profile comparison
BMC Bioinformatics, 2010
CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment
BMC Bioinformatics, 2008
Data growth and its impact on the SCOP database: new developments
Nucleic Acids Research, 2007
160-fold acceleration of the Smith-Waterman algorithm using a field programmable gate array (FPGA)
BMC Bioinformatics, 2007
The Protein Data Bank
Nucleic Acids Research, 2000
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Research, 1997
Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features
Peptide Science, 1983

Cited by 720 articles