Toward an Optimal Procedure for Variable Selection and QSAR Model Building
- 1 August 2001
- journal article
- research article
- Published by American Chemical Society (ACS) in Journal of Chemical Information and Computer Sciences
- Vol. 41 (5), 1218-1227
- https://doi.org/10.1021/ci010291a
Abstract
In this work, we report the development of a novel QSAR technique combining genetic algorithms and neural networks for selecting a subset of relevant descriptors and building the optimal neural network architecture for QSAR studies. This technique uses a neural network to map the dependent property of interest with the descriptors preselected by the genetic algorithm. This technique differs from other variable selection techniques combining genetic algorithms to neural networks by two main features: (1) The variable selection search performed by the genetic algorithm is not constrained to a defined number of descriptors. (2) The optimal neural network architecture is explored in parallel with the variable selection by dynamically modifying the size of the hidden layer. By using both artificial data and real biological data, we show that this technique can be used to build both classification and regression models and outperforms simpler variable selection techniques mainly for nonlinear data sets. The results obtained on real data are compared to previous work using other modeling techniques. We also discuss some important issues in building QSAR models and good practices for QSAR studies.Keywords
This publication has 22 references indexed in Scilit:
- Letters to the EditorEpilepsia, 1999
- Development and Validation of a Novel Variable Selection Technique with Application to Multidimensional Quantitative Structure−Activity Relationship StudiesJournal of Chemical Information and Computer Sciences, 1999
- Prediction of Human Intestinal Absorption of Drug Compounds from Molecular StructureJournal of Chemical Information and Computer Sciences, 1998
- Neural Network Studies. 3. Variable Selection in the Cascade-Correlation Learning ArchitectureJournal of Chemical Information and Computer Sciences, 1998
- GA Strategy for Variable Selection in QSAR Studies: GA-Based PLS Analysis of Calcium Channel AntagonistsJournal of Chemical Information and Computer Sciences, 1997
- Neural Network Studies. 2. Variable SelectionJournal of Chemical Information and Computer Sciences, 1996
- Neural network studies. 1. Comparison of overfitting and overtrainingJournal of Chemical Information and Computer Sciences, 1995
- The Molecular Connectivity Chi Indexes and Kappa Shape Indexes in Structure‐Property ModelingReviews in Computational Chemistry, 1991
- Chance factors in studies of quantitative structure-activity relationshipsJournal of Medicinal Chemistry, 1979
- The Correlation of Biological Activity of Plant Growth Regulators and Chloromycetin Derivatives with Hammett Constants and Partition CoefficientsJournal of the American Chemical Society, 1963