Predicting Protein-Protein Interactions from Primary Protein Sequences Using a Novel Multi-Scale Local Feature Representation Scheme and the Random Forest

Open Access

6 May 2015

journal article
research article
Published by Public Library of Science (PLoS) in PLOS ONE

Vol. 10 (5), e0125811
https://doi.org/10.1371/journal.pone.0125811

Abstract

The study of protein-protein interactions (PPIs) can be very important for the understanding of biological cellular functions. However, detecting PPIs in the laboratories are both time-consuming and expensive. For this reason, there has been much recent effort to develop techniques for computational prediction of PPIs as this can complement laboratory procedures and provide an inexpensive way of predicting the most likely set of interactions at the entire proteome scale. Although much progress has already been achieved in this direction, the problem is still far from being solved. More effective approaches are still required to overcome the limitations of the current ones. In this study, a novel Multi-scale Local Descriptor (MLD) feature representation scheme is proposed to extract features from a protein sequence. This scheme can capture multi-scale local information by varying the length of protein-sequence segments. Based on the MLD, an ensemble learning method, the Random Forest (RF) method, is used as classifier. The MLD feature representation scheme facilitates the mining of interaction information from multi-scale continuous amino acid segments, making it easier to capture multiple overlapping continuous binding patterns within a protein sequence. When the proposed method is tested with the PPI data of Saccharomyces cerevisiae, it achieves a prediction accuracy of 94.72% with 94.34% sensitivity at the precision of 98.91%. Extensive experiments are performed to compare our method with existing sequence-based method. Experimental results show that the performance of our predictor is better than several other state-of-the-art predictors also with the H. pylori dataset. The reason why such good results are achieved can largely be credited to the learning capabilities of the RF model and the novel MLD feature representation scheme. The experiment results show that the proposed approach can be very promising for predicting PPIs and can be a useful tool for future proteomic studies.

Keywords

This publication has 52 references indexed in Scilit:

Large-Scale Modelling of the Divergent Spectrin Repeats in Nesprins: Giant Modular Proteins
PLOS ONE, 2013
Structure-based prediction of protein–protein interactions on a genome-wide scale
Nature, 2012
Short Co-occurring Polypeptide Regions Can Predict Global Protein Interaction Maps
Scientific Reports, 2012
Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data
Bioinformatics, 2010
Protein interface conservation across structure space
Proceedings of the National Academy of Sciences of the United States of America, 2010
Predicting protein–protein interactions based only on sequences information
Proceedings of the National Academy of Sciences of the United States of America, 2007
Global landscape of protein complexes in the yeast Saccharomyces cerevisiae
Nature, 2006
Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry
Nature, 2002
Functional organization of the yeast proteome by systematic analysis of protein complexes
Nature, 2002
A comprehensive two-hybrid analysis to explore the yeast protein interactome
Proceedings of the National Academy of Sciences of the United States of America, 2001

Cited by 149 articles