Algebraic graph-assisted bidirectional transformers for molecular property prediction
Open Access
- 10 June 2021
- journal article
- research article
- Published by Springer Science and Business Media LLC in Nature Communications
- Vol. 12 (1), 1-9
- https://doi.org/10.1038/s41467-021-23720-w
Abstract
The ability of molecular property prediction is of great significance to drug discovery, human health, and environmental protection. Despite considerable efforts, quantitative prediction of various molecular properties remains a challenge. Although some machine learning models, such as bidirectional encoder from transformer, can incorporate massive unlabeled molecular data into molecular representations via a self-supervised learning strategy, it neglects three-dimensional (3D) stereochemical information. Algebraic graph, specifically, element-specific multiscale weighted colored algebraic graph, embeds complementary 3D molecular information into graph invariants. We propose an algebraic graph-assisted bidirectional transformer (AGBT) framework by fusing representations generated by algebraic graph and bidirectional transformer, as well as a variety of machine learning algorithms, including decision trees, multitask learning, and deep neural networks. We validate the proposed AGBT framework on eight molecular datasets, involving quantitative toxicity, physical chemistry, and physiology datasets. Extensive numerical experiments have shown that AGBT is a state-of-the-art framework for molecular property prediction.This publication has 43 references indexed in Scilit:
- A Bayesian Approach to in Silico Blood-Brain Barrier Penetration ModelingJournal of Chemical Information and Modeling, 2012
- Toxic interaction mechanism between oxytetracycline and bovine hemoglobinJournal of Hazardous Materials, 2010
- Extended-Connectivity FingerprintsJournal of Chemical Information and Modeling, 2010
- Quantitative Structure−Activity Relationship Modeling of Rat Acute Toxicity by Oral ExposureChemical Research in Toxicology, 2009
- Combinatorial QSAR Modeling of Chemical Toxicants Tested against Tetrahymena pyriformisJournal of Chemical Information and Modeling, 2008
- Computation of Octanol−Water Partition Coefficients by Guiding an Additive Model with KnowledgeJournal of Chemical Information and Modeling, 2007
- Application of ALOGPS to predict 1‐octanol/water distribution coefficients, logP, and logD, of AstraZeneca in‐house databaseJournal of Pharmaceutical Sciences, 2004
- Structure–toxicity relationships for selected halogenated aliphatic chemicalsEnvironmental Toxicology and Pharmacology, 1999
- SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rulesJournal of Chemical Information and Computer Sciences, 1988
- Correlation of Biological Activity of Phenoxyacetic Acids with Hammett Substituent Constants and Partition CoefficientsNature, 1962