Assessing Drug Target Association Using Semantic Linked Data

Open Access

5 July 2012

journal article
research article
Published by Public Library of Science (PLoS) in PLoS Computational Biology

Vol. 8 (7), e1002574
https://doi.org/10.1371/journal.pcbi.1002574

Abstract

The rapidly increasing amount of public data in chemistry and biology provides new opportunities for large-scale data mining for drug discovery. Systematic integration of these heterogeneous sets and provision of algorithms to data mine the integrated sets would permit investigation of complex mechanisms of action of drugs. In this work we integrated and annotated data from public datasets relating to drugs, chemical compounds, protein targets, diseases, side effects and pathways, building a semantic linked network consisting of over 290,000 nodes and 720,000 edges. We developed a statistical model to assess the association of drug target pairs based on their relation with other linked objects. Validation experiments demonstrate the model can correctly identify known direct drug target pairs with high precision. Indirect drug target pairs (for example drugs which change gene expression level) are also identified but not as strongly as direct pairs. We further calculated the association scores for 157 drugs from 10 disease areas against 1683 human targets, and measured their similarity using a score matrix. The similarity network indicates that drugs from the same disease area tend to cluster together in ways that are not captured by structural similarity, with several potential new drug pairings being identified. This work thus provides a novel, validated alternative to existing drug target prediction algorithms. The web service is freely available at: http://chem2bio2rdf.org/slap. Modern drug discovery requires the understanding of chemogenomics, the complex interaction of chemical compounds and drugs with a wide variety of protein target and genes in the body. A large amount of data pertaining to such relationships exists in publicly-accessible datasets but it is siloed and thus impossible to use in an integrated fashion. In this work we have integrated and semantically annotated a large amount of public data from a wide range of databases, including compound-gene, drug-drug, protein-protein, drug-side effects and so on, to create a complex network of interactions relating to compounds and protein targets. We developed a statistical algorithm called Semantic Link Association Prediction (SLAP) for predicting “missing links” in this data network: i.e. compound-target interactions for which there is no experimental data but which are statistically probable given the other relationships that exist in this set. We present validation experiments which show this method works with a high degree of accuracy, and also demonstrate how it can be used to create a drug similarity network to make predictions of new indications for existing drugs.

This publication has 43 references indexed in Scilit:

Exploiting drug-disease relationships for computational drug repositioning
Briefings in Bioinformatics, 2011
Associating Drugs, Targets and Clinical Outcomes into an Integrated Network Affords a New Platform for Computer‐Aided Drug Repurposing
Molecular Informatics, 2011
Drug-target interaction prediction from chemical, genomic and pharmacological data in an integrated framework
Bioinformatics, 2010
iPHACE: integrative navigation in pharmacological space
Bioinformatics, 2010
Predicting new molecular targets for known drugs
Nature, 2009
Supervised prediction of drug–target interactions using bipartite local models
Bioinformatics, 2009
Bio2RDF: Towards a mashup to build bioinformatics knowledge systems
Journal of Biomedical Informatics, 2008
Protein-ligand interaction prediction: an improved chemogenomics approach
Bioinformatics, 2008
SuperTarget and Matador: resources for exploring drug-target relationships
Nucleic Acids Research, 2007
Drug repositioning: identifying and developing new uses for existing drugs
Nature Reviews Drug Discovery, 2004

Cited by 158 articles