Developing and validating predictive decision tree models from mining chemical structural fingerprints and high–throughput screening data in PubChem

Open Access

25 September 2008

journal article
Published by Springer Science and Business Media LLC in BMC Bioinformatics

Vol. 9 (1), 401
https://doi.org/10.1186/1471-2105-9-401

Abstract

Background: Recent advances in high-throughput screening (HTS) techniques and readily available compound libraries generated using combinatorial chemistry or derived from natural products enable the testing of millions of compounds in a matter of days. Due to the amount of information produced by HTS assays, it is a very challenging task to mine the HTS data for potential interest in drug development research. Computational approaches for the analysis of HTS results face great challenges due to the large quantity of information and significant amounts of erroneous data produced.Results: In this study, Decision Trees (DT) based models were developed to discriminate compound bioactivities by using their chemical structure fingerprints provided in the PubChem systemhttp://pubchem.ncbi.nlm.nih.gov. The DT models were examined for filtering biological activity data contained in four assays deposited in the PubChem Bioassay Database including assays tested for 5HT1a agonists, antagonists, and HIV-1 RT-RNase H inhibitors. The 10-fold Cross Validation (CV) sensitivity, specificity and Matthews Correlation Coefficient (MCC) for the models are 57.2~80.5%, 97.3~99.0%, 0.4~0.5 respectively. A further evaluation was also performed for DT models built for two independent bioassays, where inhibitors for the same HIV RNase target were screened using different compound libraries, this experiment yields enrichment factor of 4.4 and 9.7.Conclusion: Our results suggest that the designed DT models can be used as a virtual screening technique as well as a complement to traditional approaches for hits selection.

Keywords

This publication has 47 references indexed in Scilit:

Data-Mining Methods as Useful Tools for Predicting Individual Drug Response: Application to CYP2D6 Data
Human Heredity, 2006
A Chemoinformatics Analysis of Hit Lists Obtained from High-Throughput Affinity-Selection Screening
SLAS Discovery, 2006
Novel Statistical Approach for Primary High-Throughput Screening Hit Selection
Journal of Chemical Information and Modeling, 2005
Statistical Analysis of Systematic Errors in High-Throughput Screening
SLAS Discovery, 2005
Use of Recursion Forests in the Sequential Screening Process: Consensus Selection by Multiple Recursion Trees
Journal of Chemical Information and Computer Sciences, 2003
Designing screens: how to make your hits a hit
Nature Reviews Drug Discovery, 2003
Do Structurally Similar Molecules Have Similar Biological Activity?
Journal of Medicinal Chemistry, 2002
Chemical Similarity Searching
Journal of Chemical Information and Computer Sciences, 1998
Automated Critiquing of Medical Decision Trees
Medical Decision Making, 1989
Molecular Dynamics and Minimum Energy Conformations of GnRH and Analogs: A Methodology for Computer‐aided Drug Design^a
Annals of the New York Academy of Sciences, 1985

Cited by 90 articles