Brainstorming: weighted voting prediction of inhibitors for protein targets
Open Access
- 21 September 2010
- journal article
- research article
- Published by Springer Science and Business Media LLC in Journal of Molecular Modeling
- Vol. 17 (9), 2133-2141
- https://doi.org/10.1007/s00894-010-0854-x
Abstract
The “Brainstorming” approach presented in this paper is a weighted voting method that can improve the quality of predictions generated by several machine learning (ML) methods. First, an ensemble of heterogeneous ML algorithms is trained on available experimental data, then all solutions are gathered and a consensus is built between them. The final prediction is performed using a voting procedure, whereby the vote of each method is weighted according to a quality coefficient calculated using multivariable linear regression (MLR). The MLR optimization procedure is very fast, therefore no additional computational cost is introduced by using this jury approach. Here, brainstorming is applied to selecting actives from large collections of compounds relating to five diverse biological targets of medicinal interest, namely HIV-reverse transcriptase, cyclooxygenase-2, dihydrofolate reductase, estrogen receptor, and thrombin. The MDL Drug Data Report (MDDR) database was used for selecting known inhibitors for these protein targets, and experimental data was then used to train a set of machine learning methods. The benchmark dataset (available at http://bio.icm.edu.pl/∼darman/chemoinfo/benchmark.tar.gz) can be used for further testing of various clustering and machine learning methods when predicting the biological activity of compounds. Depending on the protein target, the overall recall value is raised by at least 20% in comparison to any single machine learning method (including ensemble methods like random forest) and unweighted simple majority voting procedures.This publication has 42 references indexed in Scilit:
- AMS 3.0: prediction of post-translational modificationsBMC Bioinformatics, 2010
- Analysis of Compound Synergy in High-Throughput Cellular Screens by Population-Based Lifetime ModelingPLOS ONE, 2010
- Similarity-based virtual screening using 2D fingerprintsDrug Discovery Today, 2006
- A Consensus Data Mining secondary structure prediction by combining GOR V and Fragment Database MiningProtein Science, 2006
- Novel 2D Fingerprints for Ligand-Based Virtual ScreeningJournal of Chemical Information and Modeling, 2006
- Virtual Screening Using Binary Kernel Discrimination: Analysis of Pesticide DataJournal of Chemical Information and Modeling, 2006
- Comparison of topological descriptors for similarity-based virtual screening using multiple bioactive reference structuresOrganic & Biomolecular Chemistry, 2004
- Enhancing the Effectiveness of Virtual Screening by Fusing Nearest Neighbor Lists: A Comparison of Similarity CoefficientsJournal of Chemical Information and Computer Sciences, 2004
- Combination of Fingerprint-Based Similarity Coefficients Using Data FusionJournal of Chemical Information and Computer Sciences, 2002
- Atom pairs as molecular features in structure-activity studies: definition and applicationsJournal of Chemical Information and Computer Sciences, 1985