Classification of a Diverse Set of Tetrahymena pyriformis Toxicity Chemical Compounds from Molecular Descriptors by Statistical Learning Methods
- 18 July 2006
- journal article
- research article
- Published by American Chemical Society (ACS) in Chemical Research in Toxicology
- Vol. 19 (8), 1030-1039
- https://doi.org/10.1021/tx0600550
Abstract
Toxicity of various compounds has been measured in many studies by their toxic effects against Tetrahymena pyriformis. Efforts have also been made to use computational quantitative structure−activity relationship (QSAR) and statistical learning methods (SLMs) for predicting Tetrahymena pyriformis toxicity (TPT) at impressive accuracies. Because of the diversity of compounds and toxicity mechanisms, it is desirable to explore additional methods and to examine if these methods are applicable to more diverse sets of compounds. We tested several SLMs (logistic regression, C4.5 decision tree, k-nearest neighbor, probabilistic neural network, support vector machines) for their capability in predicting TPT by using 1129 compounds (841 TPT and 288 non-TPT agents) which are more diverse than those in other studies. A feature selection method was used for improving prediction performance and selecting molecular descriptors responsible for distinguishing TPT and non-TPT agents. The prediction accuracies are 86.9%∼94.2% for TPT and 71.2%∼87.5% for non-TPT agents based on 5-fold cross-validation studies, which are comparable to some of earlier studies despite the use of more diverse sets of compounds. The selected molecular descriptors are consistent with those used in other studies and experimental findings. These suggest that SLMs are useful for predicting TPT potential of diverse sets of compounds and for characterizing the molecular descriptors associated with TPT.Keywords
This publication has 48 references indexed in Scilit:
- QSAR and QSPR Studies of a Highly Structured Physicochemical DomainJournal of Chemical Information and Modeling, 2005
- Combining Unsupervised and Supervised Artificial Neural Networks to PredictAquatic ToxicityJournal of Chemical Information and Computer Sciences, 2004
- Linear versus nonlinear QSAR modeling of the toxicity of phenol derivatives toTetrahymena pyriformisSAR and QSAR in Environmental Research, 2004
- Effect of Molecular Descriptor Feature Selection in Support Vector Machine Classification of Pharmacokinetic and Toxicological Properties of Chemical AgentsJournal of Chemical Information and Computer Sciences, 2004
- Support Vector Machines-Based Quantitative Structure−Property Relationship for the Prediction of Heat CapacityJournal of Chemical Information and Computer Sciences, 2004
- Classification of Potential Endocrine Disrupters on the Basis of Molecular Structure Using a Nonlinear Modeling MethodJournal of Chemical Information and Computer Sciences, 2004
- An accelerated procedure for recursive feature ranking on microarray dataNeural Networks, 2003
- Modeling the Toxicity of Aromatic Compounds to Tetrahymena pyriformis: The Response Surface Methodology with Nonlinear MethodsJournal of Chemical Information and Computer Sciences, 2003
- Development of Binary Classification of Structural Chromosome Aberrations for a Diverse Set of Organic Compounds from Molecular StructureChemical Research in Toxicology, 2003
- Wrappers for feature subset selectionArtificial Intelligence, 1997