Drug drug interaction extraction from biomedical literature using syntax convolutional neural network
Open Access
- 27 July 2016
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 32 (22), 3444-3453
- https://doi.org/10.1093/bioinformatics/btw486
Abstract
Motivation: Detecting drug-drug interaction (DDI) has become a vital part of public health safety. Therefore, using text mining techniques to extract DDIs from biomedical literature has received great attentions. However, this research is still at an early stage and its performance has much room to improve. Results: In this article, we present a syntax convolutional neural network (SCNN) based DDI extraction method. In this method, a novel word embedding, syntax word embedding, is proposed to employ the syntactic information of a sentence. Then the position and part of speech features are introduced to extend the embedding of each word. Later, auto-encoder is introduced to encode the traditional bag-of-words feature (sparse 0–1 vector) as the dense real value vector. Finally, a combination of embedding-based convolutional features and traditional features are fed to the softmax classifier to extract DDIs from biomedical literature. Experimental results on the DDIExtraction 2013 corpus show that SCNN obtains a better performance (an F-score of 0.686) than other state-of-the-art methods. Availability and Implementation: The source code is available for academic use at http://202.118.75.18:8080/DDI/SCNN-DDI.zip. Contact:yangzh@dlut.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online.Funding Information
- the Natural Science Foundation of China (61070098, 61272373, 61340020, 61572102, 61572098)
- the Fundamental Research Funds for the Central Universities (DUT13JB09, DUT14YQ213)
- the Major State Research Development Program of China (2016YFC0901902)
This publication has 18 references indexed in Scilit:
- The DDI corpus: An annotated corpus with pharmacological substances and drug–drug interactionsJournal of Biomedical Informatics, 2013
- Representation Learning: A Review and New PerspectivesIEEE Transactions on Pattern Analysis and Machine Intelligence, 2013
- Why We Need an Efficient and Careful Pharmacovigilance?Journal of Pharmacovigilance, 2013
- DrugBank 3.0: a comprehensive resource for 'Omics' research on drugsNucleic Acids Research, 2010
- CLASSIFICATION OF IMBALANCED DATA: A REVIEWInternational Journal of Pattern Recognition and Artificial Intelligence, 2009
- Feature Forest Models for Probabilistic HPSG ParsingComputational Linguistics, 2008
- An empirical study of tokenization strategies for biomedical information retrievalInformation Retrieval Journal, 2007
- RelEx—Relation extraction using dependency parse treesBioinformatics, 2006
- The Unified Medical Language System (UMLS): integrating biomedical terminologyNucleic Acids Research, 2004
- A comparison of methods for multiclass support vector machinesIEEE Transactions on Neural Networks, 2002