Deep Neural Nets as a Method for Quantitative Structure–Activity Relationships
Top Cited Papers
- 17 February 2015
- journal article
- research article
- Published by American Chemical Society (ACS) in Journal of Chemical Information and Modeling
- Vol. 55 (2), 263-274
- https://doi.org/10.1021/ci500747n
Abstract
Neural networks were widely used for quantitative structure–activity relationships (QSAR) in the 1990s. Because of various practical issues (e.g., slow on large problems, difficult to train, prone to overfitting, etc.), they were superseded by more robust methods like support vector machine (SVM) and random forest (RF), which arose in the early 2000s. The last 10 years has witnessed a revival of neural networks in the machine learning community thanks to new methods for preventing overfitting, more efficient training algorithms, and advancements in computer hardware. In particular, deep neural nets (DNNs), i.e. neural nets with more than one hidden layer, have found great successes in many applications, such as computer vision and natural language processing. Here we show that DNNs can routinely make better prospective predictions than RF on a set of large diverse QSAR data sets that are taken from Merck’s drug discovery effort. The number of adjustable parameters needed for DNNs is fairly large, but our results show that it is not necessary to optimize them for individual data sets, and a single set of recommended parameters can achieve better performance than RF for most of the data sets we studied. The usefulness of the parameters is demonstrated on additional data sets not used in the calibration. Although training DNNs is still computationally intensive, using graphical processing units (GPUs) can make this issue manageable.Keywords
This publication has 12 references indexed in Scilit:
- Time-Split Cross-Validation as a Method for Estimating the Goodness of Prospective Prediction.Journal of Chemical Information and Modeling, 2013
- Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research GroupsIEEE Signal Processing Magazine, 2012
- Comparison of Random Forest and Pipeline Pilot Naïve Bayes in Prospective QSAR PredictionsJournal of Chemical Information and Modeling, 2012
- Contemporary QSAR Classifiers ComparedJournal of Chemical Information and Modeling, 2007
- Boosting: An Ensemble Learning Tool for Compound Classification and QSAR ModelingJournal of Chemical Information and Modeling, 2005
- Random Forest: A Classification and Regression Tool for Compound Classification and QSAR ModelingJournal of Chemical Information and Computer Sciences, 2003
- Quantitative Structure−Activity Relationship Studies Using Gaussian ProcessesJournal of Chemical Information and Computer Sciences, 2001
- Random ForestsMachine Learning, 2001
- Chemical Similarity Using Physiochemical Property DescriptorsJournal of Chemical Information and Computer Sciences, 1996
- Atom pairs as molecular features in structure-activity studies: definition and applicationsJournal of Chemical Information and Computer Sciences, 1985