Progress in super long loop prediction

22 July 2011

journal article
research article
Published by Wiley in Proteins-Structure Function and Bioinformatics

Vol. 79 (10), 2920-2935
https://doi.org/10.1002/prot.23129

Abstract

Sampling errors are very common in super long loop (referring here to loops that have more than thirteen residues) prediction, simply because the sampling space is vast. We have developed a dipeptide segment sampling algorithm to solve this problem. As a first step in evaluating the performance of this algorithm, it was applied to the problem of reconstructing loops in native protein structures. With a newly constructed test set of 89 loops ranging from 14 to 17 residues, this method obtains average/median global backbone root‐mean‐square deviations (RMSDs) to the native structure (superimposing the body of the protein, not the loop itself) of 1.46/0.68 Å. Specifically, results for loops of various lengths are 1.19/0.67 Å for 36 fourteen‐residue loops, 1.55/0.75 Å for 30 fifteen‐residue loops, 1.43/0.80 Å for 14 sixteen‐residue loops, and 2.30/1.92 Å for nine seventeen‐residue loops. In the vast majority of cases, the method locates energy minima that are lower than or equal to that of the minimized native loop, thus indicating that the new sampling method is successful and rarely limits prediction accuracy. Median RMSDs are substantially lower than the averages because of a small number of outliers. The causes of these failures are examined in some detail, and some can be attributed to flaws in the energy function, such as π–π interactions are not accurately accounted for by the OPLS‐AA force field we employed in this study. By introducing a new energy model which has a superior description of π–π interactions, significantly better results were achieved for quite a few former outliers. Crystal packing is explicitly included in order to provide a fair comparison with crystal structures. Proteins 2011;.

Keywords

Funding Information

NIH (GM-52018)
RAF is a stockholder in Schrodinger, Inc
Board of Directors and Scientific Advisory Board of Schrodinger, Inc.

This publication has 30 references indexed in Scilit:

The VSGB 2.0 model: A next generation energy model for high resolution protein structure modeling
Proteins-Structure Function and Bioinformatics, 2011
SuperLooper--a prediction server for the modeling of loops in globular and membrane proteins
Nucleic Acids Research, 2009
Prediction of Protein Loop Conformations Using the AGBNP Implicit Solvent Model and Torsion Angle Sampling
Journal of Chemical Theory and Computation, 2008
Toward better refinement of comparative models: Predicting loops in inexact environments
Proteins-Structure Function and Bioinformatics, 2008
Loop modeling: Sampling, filtering, and scoring
Proteins-Structure Function and Bioinformatics, 2008
Prediction of side‐chain conformations on protein surfaces
Proteins-Structure Function and Bioinformatics, 2007
ArchPRED: a template based loop structure prediction server
Nucleic Acids Research, 2006
Electrostatics of nanosystems: Application to microtubules and the ribosome
Proceedings of the National Academy of Sciences of the United States of America, 2001
Modeling of loops in protein structures
Protein Science, 2000
Algorithm AS 136: A K-Means Clustering Algorithm
Journal of the Royal Statistical Society Series C: Applied Statistics, 1979

Cited by 39 articles