Protein structure validation by generalized linear model root‐mean‐square deviation prediction
Open Access
- 23 November 2011
- journal article
- research article
- Published by Wiley in Protein Science
- Vol. 21 (2), 229-238
- https://doi.org/10.1002/pro.2007
Abstract
Large‐scale initiatives for obtaining spatial protein structures by experimental or computational means have accentuated the need for the critical assessment of protein structure determination and prediction methods. These include blind test projects such as the critical assessment of protein structure prediction (CASP) and the critical assessment of protein structure determination by nuclear magnetic resonance (CASD‐NMR). An important aim is to establish structure validation criteria that can reliably assess the accuracy of a new protein structure. Various quality measures derived from the coordinates have been proposed. A universal structural quality assessment method should combine multiple individual scores in a meaningful way, which is challenging because of their different measurement units. Here, we present a method based on a generalized linear model (GLM) that combines diverse protein structure quality scores into a single quantity with intuitive meaning, namely the predicted coordinate root‐mean‐square deviation (RMSD) value between the present structure and the (unavailable) “true” structure (GLM‐RMSD). For two sets of structural models from the CASD‐NMR and CASP projects, this GLM‐RMSD value was compared with the actual accuracy given by the RMSD value to the corresponding, experimentally determined reference structure from the Protein Data Bank (PDB). The correlation coefficients between actual (model vs. reference from PDB) and predicted (model vs. “true”) heavy‐atom RMSDs were 0.69 and 0.76, for the two datasets from CASD‐NMR and CASP, respectively, which is considerably higher than those for the individual scores (−0.24 to 0.68). The GLM‐RMSD can thus predict the accuracy of protein structures more reliably than individual coordinate‐based quality scores.Keywords
Funding Information
- Volkswagen Foundation
- Deutsche Forschungsgemeinschaft (DFG grant JA1952/1-1 (to V.J. and P.G.))
- e-NMR and WeNMR projects of the European Commission and Japan Society for the Promotion of Science (JSPS)
- National Institutes of Health Protein Structure Initiative (U54 GM094597 (to G.T.M.))
This publication has 35 references indexed in Scilit:
- MolProbity: all-atom structure validation for macromolecular crystallographyActa Crystallographica Section D-Biological Crystallography, 2009
- Benchmarking consensus model quality assessment for protein fold recognitionBMC Bioinformatics, 2007
- Evaluating protein structures determined by structural genomics consortiaProteins-Structure Function and Bioinformatics, 2007
- Assessing precision and accuracy of protein structures derived from NMR dataProteins-Structure Function and Bioinformatics, 2005
- Modeling of loops in protein structuresProtein Science, 2000
- Influence of non-bonded parameters on the quality of NMR structures: A new force field for NMR structure calculationJournal of Biomolecular NMR, 1999
- [20] VERIFY3D: Assessment of protein models with three-dimensional profilesMethods in enzymology, 1997
- Recognition of errors in three‐dimensional structures of proteinsProteins-Structure Function and Bioinformatics, 1993
- PROCHECK: a program to check the stereochemical quality of protein structuresJournal of Applied Crystallography, 1993
- WHAT IF: A molecular modeling and drug design programJournal of Molecular Graphics, 1990