Prediction of protein-protein binding site by using core interface residue and support vector machine
Open Access
- 22 December 2008
- journal article
- Published by Springer Science and Business Media LLC in BMC Bioinformatics
- Vol. 9 (1), 553
- https://doi.org/10.1186/1471-2105-9-553
Abstract
Background The prediction of protein-protein binding site can provide structural annotation to the protein interaction data from proteomics studies. This is very important for the biological application of the protein interaction data that is increasing rapidly. Moreover, methods for predicting protein interaction sites can also provide crucial information for improving the speed and accuracy of protein docking methods. Results In this work, we describe a binding site prediction method by designing a new residue neighbour profile and by selecting only the core-interface residues for SVM training. The residue neighbour profile includes both the sequential and the spatial neighbour residues of an interface residue, which is a more complete description of the physical and chemical characteristics surrounding the interface residue. The concept of core interface is applied in selecting the interface residues for training the SVM models, which is shown to result in better discrimination between the core interface and other residues. The best SVM model trained was tested on a test set of 50 randomly selected proteins. The sensitivity, specificity, and MCC for the prediction of the core interface residues were 60.6%, 53.4%, and 0.243, respectively. Our prediction results on this test set were compared with other three binding site prediction methods and found to perform better. Furthermore, our method was tested on the 101 unbound proteins from the protein-protein interaction benchmark v2.0. The sensitivity, specificity, and MCC of this test were 57.5%, 32.5%, and 0.168, respectively. Conclusion By improving both the descriptions of the interface residues and their surrounding environment and the training strategy, better SVM models were obtained and shown to outperform previous methods. Our tests on the unbound protein structures suggest further improvement is possible.Keywords
This publication has 49 references indexed in Scilit:
- The Universal Protein Resource (UniProt)Nucleic Acids Research, 2007
- PI2PE: protein interface/interior prediction engineNucleic Acids Research, 2007
- Global landscape of protein complexes in the yeast Saccharomyces cerevisiaeNature, 2006
- Protein binding site prediction using an empirical scoring functionNucleic Acids Research, 2006
- Protein analysis on a proteomic scaleNature, 2003
- ConSurf: An algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic informationJournal of Molecular Biology, 2001
- The Protein Data BankNucleic Acids Research, 2000
- Analysis of protein-protein interaction sites using surface patchesJournal of Molecular Biology, 1997
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical featuresPeptide Science, 1983