Predicting Adverse Drug Reactions from Social Media Posts: Data Balance, Feature Selection and Deep Learning
Open Access
- 25 March 2022
- journal article
- research article
- Published by MDPI AG in Healthcare
- Vol. 10 (4), 618
- https://doi.org/10.3390/healthcare10040618
Abstract
Social forums offer a lot of new channels for collecting patients’ opinions to construct predictive models of adverse drug reactions (ADRs) for post-marketing surveillance. However, due to the characteristics of social posts, there are many challenges still to be solved when deriving such models, mainly including problems caused by data sparseness, data features with a high-dimensionality, and term diversity in data. To tackle these crucial issues related to identifying ADRs from social posts, we perform data analytics from the perspectives of data balance, feature selection, and feature learning. Meanwhile, we design a comprehensive experimental analysis to investigate the performance of different data processing techniques and data modeling methods. Most importantly, we present a deep learning-based approach that adopts the BERT (Bidirectional Encoder Representations from Transformers) model with a new batch-wise adaptive strategy to enhance the predictive performance. A series of experiments have been conducted to evaluate the machine learning methods with both manual and automated feature engineering processes. The results prove that with their own advantages both types of methods are effective in ADR prediction. In contrast to the traditional machine learning methods, our feature learning approach can automatically achieve the required task to save the manual effort for the large number of experiments.This publication has 28 references indexed in Scilit:
- Utilizing social media data for pharmacovigilance: A reviewJournal of Biomedical Informatics, 2015
- Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster featuresJournal of the American Medical Informatics Association, 2015
- Portable automatic text classification for adverse drug reaction detection via multi-corpus trainingJournal of Biomedical Informatics, 2014
- Mining Adverse Drug Reactions from online healthcare forums using Hidden Markov ModelBMC Medical Informatics and Decision Making, 2014
- Digital Drug Safety Surveillance: Monitoring Pharmaceutical Products in TwitterDrug Safety, 2014
- An instance level analysis of data complexityMachine Learning, 2013
- Online discussion of drug side effects and discontinuation among breast cancer survivorsPharmacoepidemiology and Drug Safety, 2013
- Identifying potential adverse effects using the web: A new approach to medical hypothesis generationJournal of Biomedical Informatics, 2011
- A study of the behavior of several methods for balancing machine learning training dataACM SIGKDD Explorations Newsletter, 2004
- The condensed nearest neighbor rule (Corresp.)IEEE Transactions on Information Theory, 1968