Assessing model accuracy using the homology modeling automatically software

Abstract
Homology modeling is a powerful technique that greatly increases the value of experimental structure determination by using the structural information of one protein to predict the structures of homologous proteins. We have previously described a method of homology modeling by satisfaction of spatial restraints (Li et al., Protein Sci 1997;6:956–970). The Homology Modeling Automatically (HOMA) web site, , is a new tool, using this method to predict 3D structure of a target protein based on the sequence alignment of the target protein to a template protein and the structure coordinates of the template. The user is presented with the resulting models, together with an extensive structure validation report providing critical assessments of the quality of the resulting homology models. The homology modeling method employed by HOMA was assessed and validated using twenty‐four groups of homologous proteins. Using HOMA, homology models were generated for 510 proteins, including 264 proteins modeled with correct folds and 246 modeled with incorrect folds. Accuracies of these models were assessed by superimposition on the corresponding experimentally determined structures. A subset of these results was compared with parallel studies of modeling accuracy using several other automated homology modeling approaches. Overall, HOMA provides prediction accuracies similar to other state‐of‐the‐art homology modeling methods. We also provide an evaluation of several structure quality validation tools in assessing the accuracy of homology models generated with HOMA. This study demonstrates that Verify3D (Luthy et al., Nature 1992;356:83–85) and ProsaII (Sippl, Proteins 1993;17:355–362) are most sensitive in distinguishing between homology models with correct or incorrect folds. For homology models that have the correct fold, the steric conformational energy (including primarily the Van der Waals energy), MolProbity clashscore (Word et al., Protein Sci 2000;9:2251–2259), and the PROCHECK G‐factors (Laskowski et al., J Biomol NMR 1996;8:477–486) provide sensitive and consistent methods for assessing accuracy and can distinguish between homology models of higher and lower accuracy. As demonstrated in the accompanying paper (Bhattacharya et al., accompanying paper), combinations of these scores for models generated with HOMA provide a basis for distinguishing low from high accuracy models. Proteins 2008.