An iterative knowledge‐based scoring function for protein–protein recognition

Abstract
Using an efficient iterative method, we have developed a distance‐dependent knowledge‐based scoring function to predict protein–protein interactions. The function, referred to as ITScore‐PP, was derived using the crystal structures of a training set of 851 protein–protein dimeric complexes containing true biological interfaces. The key idea of the iterative method for deriving ITScore‐PP is to improve the interatomic pair potentials by iteration, until the pair potentials can distinguish true binding modes from decoy modes for the protein–protein complexes in the training set. The iterative method circumvents the challenging reference state problem in deriving knowledge‐based potentials. The derived scoring function was used to evaluate the ligand orientations generated by ZDOCK 2.1 and the native ligand structures on a diverse set of 91 protein–protein complexes. For the bound test cases, ITScore‐PP yielded a success rate of 98.9% if the top 10 ranked orientations were considered. For the more realistic unbound test cases, the corresponding success rate was 40.7%. Furthermore, for faster orientational sampling purpose, several residue‐level knowledge‐based scoring functions were also derived following the similar iterative procedure. Among them, the scoring function that uses the side‐chain center of mass (SCM) to represent a residue, referred to as ITScore‐PP(SCM), showed the best performance and yielded success rates of 71.4% and 30.8% for the bound and unbound cases, respectively, when the top 10 orientations were considered. ITScore‐PP was further tested using two other published protein–protein docking decoy sets, the ZDOCK decoy set and the RosettaDock decoy set. In addition to binding mode prediction, the binding scores predicted by ITScore‐PP also correlated well with the experimentally determined binding affinities, yielding a correlation coefficient of R = 0.71 on a test set of 74 protein–protein complexes with known affinities. ITScore‐PP is computationally efficient. The average run time for ITScore‐PP was about 0.03 second per orientation (including optimization) on a personal computer with 3.2 GHz Pentium IV CPU and 3.0 GB RAM. The computational speed of ITScore‐PP(SCM) is about an order of magnitude faster than that of ITScore‐PP. ITScore‐PP and/or ITScore‐PP(SCM) can be combined with efficient protein docking software to study protein–protein recognition. Proteins 2008.