Toward an Optimal Procedure for Variable Selection and QSAR Model Building

1 August 2001

journal article
research article
Published by American Chemical Society (ACS) in Journal of Chemical Information and Computer Sciences

Vol. 41 (5), 1218-1227
https://doi.org/10.1021/ci010291a

Abstract

In this work, we report the development of a novel QSAR technique combining genetic algorithms and neural networks for selecting a subset of relevant descriptors and building the optimal neural network architecture for QSAR studies. This technique uses a neural network to map the dependent property of interest with the descriptors preselected by the genetic algorithm. This technique differs from other variable selection techniques combining genetic algorithms to neural networks by two main features: (1) The variable selection search performed by the genetic algorithm is not constrained to a defined number of descriptors. (2) The optimal neural network architecture is explored in parallel with the variable selection by dynamically modifying the size of the hidden layer. By using both artificial data and real biological data, we show that this technique can be used to build both classification and regression models and outperforms simpler variable selection techniques mainly for nonlinear data sets. The results obtained on real data are compared to previous work using other modeling techniques. We also discuss some important issues in building QSAR models and good practices for QSAR studies.

Keywords

This publication has 22 references indexed in Scilit:

Letters to the Editor
Epilepsia, 1999
Development and Validation of a Novel Variable Selection Technique with Application to Multidimensional Quantitative Structure−Activity Relationship Studies
Journal of Chemical Information and Computer Sciences, 1999
Prediction of Human Intestinal Absorption of Drug Compounds from Molecular Structure
Journal of Chemical Information and Computer Sciences, 1998
Neural Network Studies. 3. Variable Selection in the Cascade-Correlation Learning Architecture
Journal of Chemical Information and Computer Sciences, 1998
GA Strategy for Variable Selection in QSAR Studies: GA-Based PLS Analysis of Calcium Channel Antagonists
Journal of Chemical Information and Computer Sciences, 1997
Neural Network Studies. 2. Variable Selection
Journal of Chemical Information and Computer Sciences, 1996
Neural network studies. 1. Comparison of overfitting and overtraining
Journal of Chemical Information and Computer Sciences, 1995
The Molecular Connectivity Chi Indexes and Kappa Shape Indexes in Structure‐Property Modeling
Reviews in Computational Chemistry, 1991
Chance factors in studies of quantitative structure-activity relationships
Journal of Medicinal Chemistry, 1979
The Correlation of Biological Activity of Plant Growth Regulators and Chloromycetin Derivatives with Hammett Constants and Partition Coefficients
Journal of the American Chemical Society, 1963

Cited by 156 articles