DeepAffinity: interpretable deep learning of compound–protein affinity through unified recurrent and convolutional neural networks

Top Cited Papers

15 February 2019

journal article
research article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 35 (18), 3329-3338
https://doi.org/10.1093/bioinformatics/btz111

Abstract

Drug discovery demands rapid quantification of compound–protein interaction (CPI). However, there is a lack of methods that can predict compound–protein affinity from sequences alone with high applicability, accuracy and interpretability. We present a seamless integration of domain knowledges and learning-based approaches. Under novel representations of structurally annotated protein sequences, a semi-supervised deep learning model that unifies recurrent and convolutional neural networks has been proposed to exploit both unlabeled and labeled data, for jointly encoding molecular representations and predicting affinities. Our representations and models outperform conventional options in achieving relative error in IC₅₀ within 5-fold for test cases and 20-fold for protein classes not included for training. Performances for new protein classes with few labeled data are further improved by transfer learning. Furthermore, separate and joint attention mechanisms are developed and embedded to our model to add to its interpretability, as illustrated in case studies for predicting and explaining selective drug–target interactions. Lastly, alternative representations using protein sequences or compound graphs and a unified RNN/GCNN-CNN model using graph CNN (GCNN) are also explored to reveal algorithmic challenges ahead. Data and source codes are available at https://github.com/Shen-Lab/DeepAffinity. Supplementary data are available at Bioinformatics online.

Keywords

Other Versions

Version 1, 2018-06-20, preprints

Funding Information

National Institute of General Medical Sciences
National Institutes of Health (R35GM124952)
Defense Advanced Research Projects Agency (FA8750-18-2-0027)
Texas A&M High Performance Research Computing

This publication has 40 references indexed in Scilit:

Predicting drug-target interactions using restricted Boltzmann machines
Bioinformatics, 2013
A Systematic Prediction of Multiple Drug-Target Interactions from Chemical, Genomic, and Pharmacological Data
PLOS ONE, 2012
Rational Approaches to Improving Selectivity in Drug Design
Journal of Medicinal Chemistry, 2012
Drug Off-Target Effects Predicted Using Structural Analysis in the Context of a Metabolic Network Model
PLoS Computational Biology, 2010
Predicting new molecular targets for known drugs
Nature, 2009
PubChem: a public information system for analyzing bioactivities of small molecules
Nucleic Acids Research, 2009
STITCH: interaction networks of chemicals and proteins
Nucleic Acids Research, 2007
BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities
Nucleic Acids Research, 2006
X-ray Structure of Active Site-inhibited Clotting Factor Xa
Published by Elsevier BV ,1996
Indexing by latent semantic analysis
Journal of the American Society for Information Science, 1990

Cited by 301 articles