Anin silicoensemble method for lead discovery: decision forest

Abstract
Recent progress in combinatorial chemistry and parallel synthesis has radically changed the approach to drug discovery in the pharmaceutical industry. At present, thousands of compounds can be made in a short period, creating a need for fast and effective in silico methods to select the most promising lead candidates. Decision forest is a novel pattern recognition method, which combines the results of multiple distinct but comparable decision tree models to reach a consensus prediction. In this article, a decision forest model was developed using a structurally diverse training data set containing 232 compounds whose estrogen receptor binding activity was tested at the U.S. Food and Drug Administration (FDA)'s National Center for Toxicological Research (NCTR). The model was subsequently validated using a test data set of 463 compounds selected from the literature, and then applied to a large data set with 57,145 compounds as a screening example. The results show that the decision forest method is a fast, reliable and effective in silico approach, which could be useful in drug discovery.