Size Doesn’t Matter: Predicting Physico- or Biochemical Properties Based on Dozens of Molecules

16 September 2021

journal article
research article
Published by American Chemical Society (ACS) in The Journal of Physical Chemistry Letters

Vol. 12 (38), 9213-9219
https://doi.org/10.1021/acs.jpclett.1c02477

Abstract

The use of machine learning in chemistry has become a common practice. At the same time, despite the success of modern machine learning methods, the lack of data limits their use. Using a transfer learning methodology can help solve this problem. This methodology assumes that a model built on a sufficient amount of data captures general features of the chemical compound structure on which it was trained and that the further reuse of these features on a data set with a lack of data will greatly improve the quality of the new model. In this paper, we develop this approach for small organic molecules, implementing transfer learning with graph convolutional neural networks. The paper shows a significant improvement in the performance of the models for target properties with a lack of data. The effects of the data set composition on the model’s quality and the applicability domain of the resulting models are also considered.

This publication has 38 references indexed in Scilit:

Materials discovery and design using machine learning
Journal of Materiomics, 2017
A survey of transfer learning
Journal of Big Data, 2016
An overview of molecular fingerprint similarity search in virtual screening
Expert Opinion on Drug Discovery, 2015
Deep learning
Nature, 2015
ChEMBL web services: streamlining access to drug discovery data and utilities
Nucleic Acids Research, 2015
Defining a novel k-nearest neighbours approach to assess the applicability domain of a QSAR model for reliable predictions
Journal of Cheminformatics, 2013
Understanding drug‐likeness
WIREs Computational Molecular Science, 2011
Extended-Connectivity Fingerprints
Journal of Chemical Information and Modeling, 2010
Random Forests
Machine Learning, 2001
Mixtures of Probabilistic Principal Component Analyzers
Neural Computation, 1999

Cited by 5 articles