Matt: Local Flexibility Aids Protein Multiple Structure Alignment

Open Access

11 January 2008

journal article
research article
Published by Public Library of Science (PLoS) in PLoS Computational Biology

Vol. 4 (1), e10
https://doi.org/10.1371/journal.pcbi.0040010

Abstract

Even when there is agreement on what measure a protein multiple structure alignment should be optimizing, finding the optimal alignment is computationally prohibitive. One approach used by many previous methods is aligned fragment pair chaining, where short structural fragments from all the proteins are aligned against each other optimally, and the final alignment chains these together in geometrically consistent ways. Ye and Godzik have recently suggested that adding geometric flexibility may help better model protein structures in a variety of contexts. We introduce the program Matt (Multiple Alignment with Translations and Twists), an aligned fragment pair chaining algorithm that, in intermediate steps, allows local flexibility between fragments: small translations and rotations are temporarily allowed to bring sets of aligned fragments closer, even if they are physically impossible under rigid body transformations. After a dynamic programming assembly guided by these “bent” alignments, geometric consistency is restored in the final step before the alignment is output. Matt is tested against other recent multiple protein structure alignment programs on the popular Homstrad and SABmark benchmark datasets. Matt's global performance is competitive with the other programs on Homstrad, but outperforms the other programs on SABmark, a benchmark of multiple structure alignments of proteins with more distant homology. On both datasets, Matt demonstrates an ability to better align the ends of α-helices and β-strands, an important characteristic of any structure alignment program intended to help construct a structural template library for threading approaches to the inverse protein-folding problem. The related question of whether Matt alignments can be used to distinguish distantly homologous structure pairs from pairs of proteins that are not homologous is also considered. For this purpose, a p-value score based on the length of the common core and average root mean squared deviation (RMSD) of Matt alignments is shown to largely separate decoys from homologous protein structures in the SABmark benchmark dataset. We postulate that Matt's strong performance comes from its ability to model proteins in different conformational states and, perhaps even more important, its ability to model backbone distortions in more distantly related proteins. Proteins fold into complicated highly asymmetrical 3-D shapes. When a protein is found to fold in a shape that is sufficiently similar to other proteins whose functional roles are known, this can significantly aid in predicting function in the new protein. In addition, the areas where structure is highly conserved in a set of such similar proteins may indicate functional or structural importance of the conserved region. Given a set of protein structures, the protein structural alignment problem is to determine the superimposition of the backbones of these protein structures that places as much of the structures as possible into close spatial alignment. We introduce an algorithm that allows local flexibility in the structures when it brings them into closer alignment. The algorithm performs as well as its competitors when the structures to be aligned are highly similar, and outperforms them by a larger and larger margin as similarity decreases. In addition, for the related classification problem that asks if the degree of structural similarity between two proteins implies if they likely evolved from a common ancestor, a scoring function assesses, based on the best alignment generated for each pair of protein structures, whether they should be declared sufficiently structurally similar or not. This score can be used to predict when two proteins have sufficiently similar shapes to likely share functional characteristics.

This publication has 50 references indexed in Scilit:

A Parameterized Algorithm for Protein Structure Alignment
Journal of Computational Biology, 2007
Comprehensive Evaluation of Protein Structure Alignment Methods: Scoring by Geometric Measures
Journal of Molecular Biology, 2005
Multiple flexible structure alignment using partial order graphs
Bioinformatics, 2005
SABmark—a benchmark for sequence alignment that covers the entire known fold space
Bioinformatics, 2004
3DCoffee: Combining Protein Sequences and Structures within Multiple Sequence Alignments
Journal of Molecular Biology, 2004
Multiple structural alignment by secondary structures: Algorithm and applications
Protein Science, 2003
Comprehensive assessment of automatic structural alignment against a manual standard, the scop classification of proteins
Protein Science, 1998
Do aligned sequences share the same fold?
Journal of Molecular Biology, 1997
SWISS‐MODEL and the Swiss‐Pdb Viewer: An environment for comparative protein modeling
Electrophoresis, 1997
Enlarged representative set of protein structures
Protein Science, 1994

Cited by 180 articles