Constrained Bayesian optimization for automatic chemical design using variational autoencoders
Open Access
- 18 November 2019
- journal article
- research article
- Published by Royal Society of Chemistry (RSC) in Chemical Science
- Vol. 11 (2), 577-586
- https://doi.org/10.1039/c9sc04026a
Abstract
Automatic Chemical Design is a framework for generating novel molecules with optimized properties. The original scheme, featuring Bayesian optimization over the latent space of a variational autoencoder, suffers from the pathology that it tends to produce invalid molecular structures. First, we demonstrate empirically that this pathology arises when the Bayesian optimization scheme queries latent space points far away from the data on which the variational autoencoder has been trained. Secondly, by reformulating the search procedure as a constrained Bayesian optimization problem, we show that the effects of this pathology can be mitigated, yielding marked improvements in the validity of the generated molecules. We posit that constrained Bayesian optimization is a good approach for solving this kind of training set mismatch in many generative tasks involving Bayesian optimization over the latent space of a variational autoencoder.Keywords
This publication has 52 references indexed in Scilit:
- ZINC: A Free Tool to Discover Chemistry for BiologyJournal of Chemical Information and Modeling, 2012
- Quantifying the chemical beauty of drugsNature Chemistry, 2012
- ChEMBL: a large-scale bioactivity database for drug discoveryNucleic Acids Research, 2011
- Extended-Connectivity FingerprintsJournal of Chemical Information and Modeling, 2010
- New Substructure Filters for Removal of Pan Assay Interference Compounds (PAINS) from Screening Libraries and for Their Exclusion in BioassaysJournal of Medicinal Chemistry, 2010
- Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributionsJournal of Cheminformatics, 2009
- Lessons Learnt from Assembling Screening Libraries for Drug Discovery for Neglected DiseasesChemMedChem, 2008
- An Empirical Process for the Design of High-Throughput Screening Deck FiltersJournal of Chemical Information and Modeling, 2006
- Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settingsAdvanced Drug Delivery Reviews, 1997
- SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rulesJournal of Chemical Information and Computer Sciences, 1988