Prediction of protein-protein binding site by using core interface residue and support vector machine

Open Access

22 December 2008

journal article
Published by Springer Science and Business Media LLC in BMC Bioinformatics

Vol. 9 (1), 553
https://doi.org/10.1186/1471-2105-9-553

Abstract

Background The prediction of protein-protein binding site can provide structural annotation to the protein interaction data from proteomics studies. This is very important for the biological application of the protein interaction data that is increasing rapidly. Moreover, methods for predicting protein interaction sites can also provide crucial information for improving the speed and accuracy of protein docking methods. Results In this work, we describe a binding site prediction method by designing a new residue neighbour profile and by selecting only the core-interface residues for SVM training. The residue neighbour profile includes both the sequential and the spatial neighbour residues of an interface residue, which is a more complete description of the physical and chemical characteristics surrounding the interface residue. The concept of core interface is applied in selecting the interface residues for training the SVM models, which is shown to result in better discrimination between the core interface and other residues. The best SVM model trained was tested on a test set of 50 randomly selected proteins. The sensitivity, specificity, and MCC for the prediction of the core interface residues were 60.6%, 53.4%, and 0.243, respectively. Our prediction results on this test set were compared with other three binding site prediction methods and found to perform better. Furthermore, our method was tested on the 101 unbound proteins from the protein-protein interaction benchmark v2.0. The sensitivity, specificity, and MCC of this test were 57.5%, 32.5%, and 0.168, respectively. Conclusion By improving both the descriptions of the interface residues and their surrounding environment and the training strategy, better SVM models were obtained and shown to outperform previous methods. Our tests on the unbound protein structures suggest further improvement is possible.

Keywords

This publication has 49 references indexed in Scilit:

The Universal Protein Resource (UniProt)
Nucleic Acids Research, 2007
PI2PE: protein interface/interior prediction engine
Nucleic Acids Research, 2007
Global landscape of protein complexes in the yeast Saccharomyces cerevisiae
Nature, 2006
Protein binding site prediction using an empirical scoring function
Nucleic Acids Research, 2006
Protein analysis on a proteomic scale
Nature, 2003
ConSurf: An algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information
Journal of Molecular Biology, 2001
The Protein Data Bank
Nucleic Acids Research, 2000
Analysis of protein-protein interaction sites using surface patches
Journal of Molecular Biology, 1997
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Research, 1997
Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features
Peptide Science, 1983

Cited by 59 articles