Deep Neural Nets as a Method for Quantitative Structure–Activity Relationships

Top Cited Papers

17 February 2015

journal article
research article
Published by American Chemical Society (ACS) in Journal of Chemical Information and Modeling

Vol. 55 (2), 263-274
https://doi.org/10.1021/ci500747n

Abstract

Neural networks were widely used for quantitative structure–activity relationships (QSAR) in the 1990s. Because of various practical issues (e.g., slow on large problems, difficult to train, prone to overfitting, etc.), they were superseded by more robust methods like support vector machine (SVM) and random forest (RF), which arose in the early 2000s. The last 10 years has witnessed a revival of neural networks in the machine learning community thanks to new methods for preventing overfitting, more efficient training algorithms, and advancements in computer hardware. In particular, deep neural nets (DNNs), i.e. neural nets with more than one hidden layer, have found great successes in many applications, such as computer vision and natural language processing. Here we show that DNNs can routinely make better prospective predictions than RF on a set of large diverse QSAR data sets that are taken from Merck’s drug discovery effort. The number of adjustable parameters needed for DNNs is fairly large, but our results show that it is not necessary to optimize them for individual data sets, and a single set of recommended parameters can achieve better performance than RF for most of the data sets we studied. The usefulness of the parameters is demonstrated on additional data sets not used in the calibration. Although training DNNs is still computationally intensive, using graphical processing units (GPUs) can make this issue manageable.

Keywords

This publication has 12 references indexed in Scilit:

Time-Split Cross-Validation as a Method for Estimating the Goodness of Prospective Prediction.
Journal of Chemical Information and Modeling, 2013
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups
IEEE Signal Processing Magazine, 2012
Comparison of Random Forest and Pipeline Pilot Naïve Bayes in Prospective QSAR Predictions
Journal of Chemical Information and Modeling, 2012
Contemporary QSAR Classifiers Compared
Journal of Chemical Information and Modeling, 2007
Boosting: An Ensemble Learning Tool for Compound Classification and QSAR Modeling
Journal of Chemical Information and Modeling, 2005
Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling
Journal of Chemical Information and Computer Sciences, 2003
Quantitative Structure−Activity Relationship Studies Using Gaussian Processes
Journal of Chemical Information and Computer Sciences, 2001
Random Forests
Machine Learning, 2001
Chemical Similarity Using Physiochemical Property Descriptors
Journal of Chemical Information and Computer Sciences, 1996
Atom pairs as molecular features in structure-activity studies: definition and applications
Journal of Chemical Information and Computer Sciences, 1985

Cited by 836 articles