Modeling Aqueous Solubility

1 May 2003

journal article
research article
Published by American Chemical Society (ACS) in Journal of Chemical Information and Computer Sciences

Vol. 43 (3), 837-841
https://doi.org/10.1021/ci020279y

Abstract

This paper describes the development of an aqueous solubility model based on solubility data from the Syracuse database, calculated octanol−water partition coefficient, and 51 2D molecular descriptors. Two different statistical packages, SIMCA and Cubist, were used and the results were compared. The Cubist model, which comprises a collection of rules, each of which has an associated Multiple Linear Regression model (MLR), gave better overall results on a test set of 640 compounds with an overall squared correlation coefficient of 0.74 and an absolute average error of 0.68 log units. Both training and independent test sets had similar distributions of structures in terms of the different functionalities present60% neutral, 14% acidic, 8% phenolic, 11% monobasic, 4% polybasic, and 3% zwitterionic molecules. Sets were designed by random selection, with 2688 (81%) and 640 (19%) molecules, respectively, forming the training and the test sets.

Keywords

This publication has 2 references indexed in Scilit:

Estimation of aqueous solubility of organic molecules by the group contribution approach. Application to the study of biodegradation
Journal of Chemical Information and Computer Sciences, 1992
Herman wold, the father of PLS
Chemometrics and Intelligent Laboratory Systems, 1992

Cited by 49 articles