SFCscore: Scoring functions for affinity prediction of protein–ligand complexes

Abstract
Empirical scoring functions to calculate binding affinities of protein–ligand complexes have been calibrated based on experimental structure and affinity data collected from public and industrial sources. Public data were taken from the AffinDB database, whereas access to industrial data was gained through the Scoring Function Consortium (SFC), a collaborative effort with various pharmaceutical companies and the Cambridge Crystallographic Data Center. More than 850 complexes were obtained by the data collection procedure and subsequently used to setup different training sets for the parameterization of new scoring functions. Over 60 different descriptors were evaluated for all complexes, including terms accounting for interactions with and among aromatic ring systems as well as many surface-dependent terms. After exploratory correlation and regression analyses, stepwise variable selection procedures and systematic searches, the most suitable descriptors were chosen as variables to calibrate regression functions by means of multiple linear regression or partial least squares analysis. Eight different functions are presented herein. Cross-validated r2 (Q2) values of up to 0.72 and standard errors (sPRESS) generally below 1.15 pKi units suggest highly predictive functions. Extensive unbiased validation was carried out by testing the functions on large data sets from the PDBbind database as used by Wang et al. (J Chem Inf Comput Sci 2004;44:2114–2125) in a comparative analysis of other scoring functions. Superior performance of the SFCscore functions is observed in many cases, but the results also illustrate the need for further improvements. Proteins 2008.