Utilizing Deep Learning for Detecting Adverse Drug Events in Structured and Unstructured Regulatory Drug Data Sets

Abstract
Background The US Food and Drug Administration (FDA) collects and retains several data sets on post-market drugs and associated adverse events (AEs). The FDA Adverse Event Reporting System (FAERS) contains millions of AE reports submitted by the public when a medication is suspected to have caused an AE. The FDA monitors these reports to identify drug safety issues that were undetected during the premarket evaluation of these products. These reports contain patient narratives that provide information regarding the AE that needs to be coded using standardized terminology to enable aggregation of reports for further review. Additionally, the FDA collects structured drug product labels (SPLs) that facilitate standardized distribution of information regarding marketed medical products. Manufacturers are currently not required to code labels with associated AEs. Objectives Approaches for automated classification of reports by preferred terminology could enhance regulatory efficiency. The goal of this work was to assess the suitability of manually annotated FDA FAERS and SPL data sets to be subjected to predictive modeling. Methods A recurrent neural network (RNN) was proposed as a proof-of-concept model for automated extraction of preferred AE terminology. A separate RNN was fit and cross-validated on two regulatory data sets with varying properties. First, the researchers trained and cross-validated a model on 325 annotated FAERS patient narratives for a sample of AE terms. A model was then trained and validated on a data set of 100 SPLs. Results Model cross-validation results for product labels demonstrated that the model performed at least as well as more conventional models for all but one of the terms selected based on F1-score. Model results for the FAERS data set were mixed. Conclusions This work successfully demonstrated a proof-of-concept machine learning approach to automatically detect AEs in several textual regulatory data sets to support post-market regulatory activities. Limited instances of each AE class likely prohibited models from generalizing data effectively. Additional data may permit more robust validation.
Funding Information
  • U.S. Food and Drug Administration (U01 FD005946)