Exploring the octanol–water partition coefficient dataset using deep learning techniques and data augmentation
Open Access
- 14 June 2021
- journal article
- research article
- Published by Springer Science and Business Media LLC in Communications Chemistry
- Vol. 4 (1), 1-10
- https://doi.org/10.1038/s42004-021-00528-9
Abstract
Today more and more data are freely available. Based on these big datasets deep neural networks (DNNs) rapidly gain relevance in computational chemistry. Here, we explore the potential of DNNs to predict chemical properties from chemical structures. We have selected the octanol-water partition coefficient (log P) as an example, which plays an essential role in environmental chemistry and toxicology but also in chemical analysis. The predictive performance of the developed DNN is good with an rmse of 0.47 log units in the test dataset and an rmse of 0.33 for an external dataset from the SAMPL6 challenge. To this end, we trained the DNN using data augmentation considering all potential tautomeric forms of the chemicals. We further demonstrate how DNN models can help in the curation of the log P dataset by identifying potential errors, and address limitations of the dataset itself.Keywords
This publication has 60 references indexed in Scilit:
- Open Babel: An open chemical toolboxJournal of Cheminformatics, 2011
- Improving Drug Candidates by Design: A Focus on Physicochemical Properties As a Means of Improving Compound Disposition and SafetyChemical Research in Toxicology, 2011
- Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical informationJournal of Computer-Aided Molecular Design, 2011
- Tautomer Identification and Tautomer Structure Generation Based on the InChI CodeJournal of Chemical Information and Modeling, 2010
- Let’s not forget tautomersJournal of Computer-Aided Molecular Design, 2009
- Slow-stirring method for determining then-octanol/water partition coefficient (pow) for highly hydrophobic chemicals: Performance evaluation in a ring testEnvironmental Toxicology and Chemistry, 2003
- Quantitative Structure Activity Relationships for Predicting the Bioaccumulation of POPs in Terrestrial Food‐WebsQSAR & Combinatorial Science, 2003
- Hydrophobicity and Central Nervous System Agents: On the Principle of Minimal Hydrophobicity in Drug DesignJournal of Pharmaceutical Sciences, 1987
- Atomic physicochemical parameters for three-dimensional-structure-directed quantitative structure-activity relationships. 2. Modeling dispersive and hydrophobic interactionsJournal of Chemical Information and Computer Sciences, 1987
- Generator column determination of octanol/water partition coefficients for selected polychlorinated biphenyl congenersEnvironmental Science & Technology, 1984