Active Learning with Support Vector Machines in the Drug Discovery Process

12 February 2003

journal article
research article
Published by American Chemical Society (ACS) in Journal of Chemical Information and Computer Sciences

Vol. 43 (2), 667-673
https://doi.org/10.1021/ci025620t

Abstract

We investigate the following data mining problem from computer-aided drug design: From a large collection of compounds, find those that bind to a target molecule in as few iterations of biochemical testing as possible. In each iteration a comparatively small batch of compounds is screened for binding activity toward this target. We employed the so-called “active learning paradigm” from Machine Learning for selecting the successive batches. Our main selection strategy is based on the maximum margin hyperplanegenerated by “Support Vector Machines”. This hyperplane separates the current set of active from the inactive compounds and has the largest possible distance from any labeled compound. We perform a thorough comparative study of various other selection strategies on data sets provided by DuPont Pharmaceuticals and show that the strategies based on the maximum margin hyperplane clearly outperform the simpler ones.

This publication has 5 references indexed in Scilit:

A Novel Shape-Feature Based Approach to Virtual Library Screening
Journal of Chemical Information and Computer Sciences, 2002
Coupling structure-based design with combinatorial chemistry: application of active site derived pharmacophores with informative library design
Journal of Molecular Graphics and Modelling, 2002
Sparse Regression Ensembles in Infinite and Finite Hypothesis Spaces
Machine Learning, 2002
Drug design by machine learning: support vector machines for pharmaceutical data analysis
Computers & Chemistry, 2001
The Nature of Statistical Learning Theory
Published by Springer Science and Business Media LLC ,1995

Cited by 274 articles