Early Detection of Adverse Drug Reactions in Social Health Networks: A Natural Language Processing Pipeline for Signal Detection
Open Access
- 3 June 2019
- journal article
- research article
- Published by JMIR Publications Inc. in JMIR Public Health and Surveillance
- Vol. 5 (2), e11264-216
- https://doi.org/10.2196/11264
Abstract
Journal of Medical Internet Research - International Scientific Journal for Medical Research, Information and Communication on the Internet #Preprint #PeerReviewMe: Warning: This is a unreviewed preprint. Readers are warned that the document has not been peer-reviewed by expert/patient reviewers or an academic editor, may contain misleading claims, and is likely to undergo changes before final publication, if accepted, or may have been rejected/withdrawn. Readers with interest and expertise are encouraged to sign up as peer-reviewer, if the paper is within an open peer-review period. Please cite this preprint only for review purposes or for grant applications and CVs (if you are the author). Background: Adverse drug reactions (ADRs) occur in nearly all patients on chemotherapy, causing morbidity and therapy disruptions. Detection of such of ADRs is limited in clinical trials, which are underpowered to detect rare events. Early recognition of ADRs in the post-marketing phase could substantially reduce morbidity and decrease societal costs. Internet community health forums provide a mechanism for individuals to discuss real-time health concerns and can enable computational detection of ADRs. Objective: To identify cutaneous ADR signals in social health networks and compare the frequency and timing of these ADRs to clinical reports in the literature. Methods: We present a natural language processing (NLP) based ADR signal generation pipeline based on patient posts on internet social health networks. We identify user posts from Inspire health forum related to two chemotherapy classes: erlotinib, an epidermal growth factor receptor inhibitor, and nivolumab and pembrolizumab, immune checkpoint inhibitors. We extract mentions of ADRs from unstructured content of patient posts. We then perform population-level association analyses and time-to-detection analyses. Results: Our system detected cutaneous ADRs from patient reports with high precision (0.90) and at frequencies comparable to those documented in the literature, but an average of 7 months ahead of their literature reporting. Known ADRs were associated with higher proportional reporting ratios compared to negative controls, demonstrating the robustness of our analyses. Our named entity recognition system achieved 0.738 micro-averaged F-measure in detecting ADR entities (not limited to the cutaneous ADRs) in health forum posts. Additionally, we discovered the novel ADR of hypohidrosis reported by 23 patients in erlotinib related posts; this ADR was absent from 15 years of literature on this medication and we recently reported the finding in a clinical oncology journal. Conclusions: Several hundred million patients report health concerns in social health networks, yet this information is markedly underutilized for pharmacosurveillance. We demonstrate the ability of an NLP-based signal generation pipeline to accurately detect patient reports of ADRs months in advance of literature reporting, and the robustness of statistical analyses to validate system detections. Our findings suggest the important contributions that social health network data can play in contributing to more comprehensive and timely pharmacovigilance.This publication has 44 references indexed in Scilit:
- DNorm: disease name normalization with pairwise learning to rankBioinformatics, 2013
- Scoping Review on Search Queries and Social Media for Disease Surveillance: A Chronology of InnovationJournal of Medical Internet Research, 2013
- Extraction of potential adverse drug events from medical case reportsJournal of Biomedical Semantics, 2012
- First-line erlotinib in patients with advanced non-small-cell lung cancer unsuitable for chemotherapy (TOPICAL): a double-blind, placebo-controlled, phase 3 trialThe Lancet Oncology, 2012
- The emergence of supportive oncodermatology: The study of dermatologic adverse events to cancer therapiesJournal of the American Academy of Dermatology, 2011
- GeneTUKit: a software for document-level gene normalizationBioinformatics, 2011
- Temporal pattern discovery in longitudinal electronic patient recordsData Mining and Knowledge Discovery, 2009
- The New Sentinel Network — Improving the Evidence of Medical-Product SafetyThe New England Journal of Medicine, 2009
- Infodemiology and Infoveillance: Framework for an Emerging Set of Public Health Informatics Methods to Analyze Search, Communication and Publication Behavior on the InternetJournal of Medical Internet Research, 2009
- Description and management of cutaneous side effects during cetuximab or erlotinib treatments: A prospective study of 30 patientsJournal of the American Academy of Dermatology, 2006