Assessment of comparative modeling in CASP2

Abstract
An assessment is presented for all submissions to the comparative modeling challenge in the 1996 Critical Assessment of Structure Prediction (CASP2), Of the original 12 target structures, 9 were solved prior to the meeting: 8 by X-ray crystallography and 1 by NMR spectroscopy. These targets varied over a large range of difficulty, as assessed by the percentage sequence identity with the principal parent structure, which ranged from 20% up to 55%. The overall quality of the models reflected the similarity of the principal parent, As expected, when the sequence alignment was correct, the core was accurately modeled, with the largest deviations occurring in the loops. Models were built which gave C alpha root-mean-square deviations (RMSDs) compared with the observed structure of 18 Angstrom. Compared with CASP1, the geometry of the models was significantly improved with no D-amino acids. By far the major contribution to RMSD error was the alignment accuracy, which varied from 100% down to 7% over the range of targets. In the structurally variable regions, global shifts, caused by hinge bending, were the major source of error, giving significantly lower local RMSDs than global RMSDs, In over 50% of these noncore regions, the difference between global and local RMSDs was more than 3 Angstrom and was as high as 10 Angstrom for one structurally variable region. For the side chains, the chi(1) RMSDs are strongly correlated with the C alpha RMSDs. For models with C alpha deviations less than 1 Angstrom, on average 78.5% of side chains are placed in the correct rotamer, although the chi(1) RMSDs, though clearly better than random, were disappointing at around 46 degrees. As the backbone deviations increased, the side chain placement became less accurate, with an average chi(1) RMSD of 75 degrees on a 1.5-2.5 Angstrom C alpha backbone (average 51.4% correct rotamer). Refinement by energy minimization or molecular dynamics made only minor adjustments to improve local geometry and generally made small, but not significant, improvements to the RMSD, In total, 19 groups submitted 62 models (89 coordinate sets) that could be assessed. Most modelers used manual adjustments to sequence alignments and, in general, good alignments were obtained down to 25% sequence identity. The modeling methods ranged from "classical" modeling, involving core building followed by loop and side chain addition, to more sophisticated approaches based on probability distributions, Monte Carlo sampling or distance geometry. For each target, several groups produced equally good models, given the expected errors in the structures (about 0.5 Angstrom). No one method came out as clearly superior, although the approaches that inherit directly from the parents generally performed better than the more radical techniques. However, for each target there were some poor models, usually reflecting a poor sequence alignment, and the range of accuracy for each target is therefore large. Fully automated methods are able to perform very well for "easy" targets (85% sequence identity with parent), but when modeling using a distantly related parent, care and expertise, especially in performing the alignment, still appear to be important factors in generating accurate models. (C) 1998 Wiley-Liss, Inc.