Straightforward Recursive Partitioning Model for Discarding Insoluble Compounds in the Drug Discovery Process

Abstract
Poor aqueous solubility is one of the major issues in drug discovery and development, impacting negatively on all aspects of the research and development process. The pharmaceutical industry has realized that solubility issues need to be resolved at the discovery stage. We here present an innovative way to address this problem via a model designed to address the simple question, “Is the compound likely to be sufficiently soluble to provide interpretable data in biological screening assays?” A recursive partitioning (RP) method was applied to a set of 3563 molecules, with in house determined aqueous solubility values. Five models were generated on the basis of a small number of descriptors affording intuitive information regarding structural features influencing solubility. The final model was based on only two descriptors: the molecular weight (MW) and the aromatic proportion (AP). This model provided satisfactory values of accuracy (81%) and precision (75%) for a test set of 1200 compounds, suggesting that the model may add value in compound selection and library design during early drug discovery.

This publication has 24 references indexed in Scilit: