Predicting Adverse Drug Reactions from Social Media Posts: Data Balance, Feature Selection and Deep Learning

Open Access

25 March 2022

journal article
research article
Published by MDPI AG in Healthcare

Vol. 10 (4), 618
https://doi.org/10.3390/healthcare10040618

Abstract

Social forums offer a lot of new channels for collecting patients’ opinions to construct predictive models of adverse drug reactions (ADRs) for post-marketing surveillance. However, due to the characteristics of social posts, there are many challenges still to be solved when deriving such models, mainly including problems caused by data sparseness, data features with a high-dimensionality, and term diversity in data. To tackle these crucial issues related to identifying ADRs from social posts, we perform data analytics from the perspectives of data balance, feature selection, and feature learning. Meanwhile, we design a comprehensive experimental analysis to investigate the performance of different data processing techniques and data modeling methods. Most importantly, we present a deep learning-based approach that adopts the BERT (Bidirectional Encoder Representations from Transformers) model with a new batch-wise adaptive strategy to enhance the predictive performance. A series of experiments have been conducted to evaluate the machine learning methods with both manual and automated feature engineering processes. The results prove that with their own advantages both types of methods are effective in ADR prediction. In contrast to the traditional machine learning methods, our feature learning approach can automatically achieve the required task to save the manual effort for the large number of experiments.

This publication has 28 references indexed in Scilit:

Utilizing social media data for pharmacovigilance: A review
Journal of Biomedical Informatics, 2015
Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features
Journal of the American Medical Informatics Association, 2015
Portable automatic text classification for adverse drug reaction detection via multi-corpus training
Journal of Biomedical Informatics, 2014
Mining Adverse Drug Reactions from online healthcare forums using Hidden Markov Model
BMC Medical Informatics and Decision Making, 2014
Digital Drug Safety Surveillance: Monitoring Pharmaceutical Products in Twitter
Drug Safety, 2014
An instance level analysis of data complexity
Machine Learning, 2013
Online discussion of drug side effects and discontinuation among breast cancer survivors
Pharmacoepidemiology and Drug Safety, 2013
Identifying potential adverse effects using the web: A new approach to medical hypothesis generation
Journal of Biomedical Informatics, 2011
A study of the behavior of several methods for balancing machine learning training data
ACM SIGKDD Explorations Newsletter, 2004
The condensed nearest neighbor rule (Corresp.)
IEEE Transactions on Information Theory, 1968

Cited by 7 articles