Evaluation of features for catalytic residue prediction in novel folds

1 February 2007

journal article
research article
Published by Wiley in Protein Science

Vol. 16 (2), 216-226
https://doi.org/10.1110/ps.062523907

Abstract

Structural genomics projects are determining the three-dimensional structure of proteins without full characterization of their function. A critical part of the annotation process involves appropriate knowledge representation and prediction of functionally important residue environments. We have developed a method to extract features from sequence, sequence alignments, three-dimensional structure, and structural environment conservation, and used support vector machines to annotate homologous and nonhomologous residue positions based on a specific training set of residue functions. In order to evaluate this pipeline for automated protein annotation, we applied it to the challenging problem of prediction of catalytic residues in enzymes. We also ranked the features based on their ability to discriminate catalytic from noncatalytic residues. When applying our method to a well-annotated set of protein structures, we found that top-ranked features were a measure of sequence conservation, a measure of structural conservation, a degree of uniqueness of a residue's structural environment, solvent accessibility, and residue hydrophobicity. We also found that features based on structural conservation were complementary to those based on sequence conservation and that they were capable of increasing predictor performance. Using a family nonredundant version of the ASTRAL 40 v1.65 data set, we estimated that the true catalytic residues were correctly predicted in 57.0% of the cases, with a precision of 18.5%. When testing on proteins containing novel folds not used in training, the best features were highly correlated with the training on families, thus validating the approach to nonhomologous catalytic residue prediction in general. We then applied the method to 2781 coordinate files from the structural genomics target pipeline and identified both highly ranked and highly clustered groups of predicted catalytic residues.

Keywords

This publication has 33 references indexed in Scilit:

Using a Library of Structural Templates to Recognise Catalytic Sites and Explore their Evolution in Homologous Families
Journal of Molecular Biology, 2005
Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions
Acta crystallographica. Section D, Structural biology, 2004
Identification of interface residues in protease-inhibitor and antigen-antibody complexes: a support vector machine approach
Neural Computing & Applications, 2004
Prediction and Functional Analysis of Native Disorder in Proteins from the Three Kingdoms of Life
Journal of Molecular Biology, 2004
Using A Neural Network and Spatial Clustering to Predict the Location of Active Sites in Enzymes
Journal of Molecular Biology, 2003
Prediction of functionally important residues based solely on the computed energetics of protein structure
Journal of Molecular Biology, 2001
A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach
Journal of Molecular Biology, 2001
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Research, 1997
An Evolutionary Trace Method Defines Binding Surfaces Common to Protein Families
Journal of Molecular Biology, 1996
Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features
Peptide Science, 1983

Cited by 60 articles