A secure distributed logistic regression protocol for the detection of rare adverse drug events

Open Access

7 August 2012

journal article
Published by Oxford University Press (OUP) in Journal of the American Medical Informatics Association

Vol. 20 (3), 453-461
https://doi.org/10.1136/amiajnl-2011-000735

Abstract

Background There is limited capacity to assess the comparative risks of medications after they enter the market. For rare adverse events, the pooling of data from multiple sources is necessary to have the power and sufficient population heterogeneity to detect differences in safety and effectiveness in genetic, ethnic and clinically defined subpopulations. However, combining datasets from different data custodians or jurisdictions to perform an analysis on the pooled data creates significant privacy concerns that would need to be addressed. Existing protocols for addressing these concerns can result in reduced analysis accuracy and can allow sensitive information to leak. Objective To develop a secure distributed multi-party computation protocol for logistic regression that provides strong privacy guarantees. Methods We developed a secure distributed logistic regression protocol using a single analysis center with multiple sites providing data. A theoretical security analysis demonstrates that the protocol is robust to plausible collusion attacks and does not allow the parties to gain new information from the data that are exchanged among them. The computational performance and accuracy of the protocol were evaluated on simulated datasets. Results The computational performance scales linearly as the dataset sizes increase. The addition of sites results in an exponential growth in computation time. However, for up to five sites, the time is still short and would not affect practical applications. The model parameters are the same as the results on pooled raw data analyzed in SAS, demonstrating high model accuracy. Conclusion The proposed protocol and prototype system would allow the development of logistic regression models in a secure manner without requiring the sharing of personal health information. This can alleviate one of the key barriers to the establishment of large-scale post-marketing surveillance programs. We extended the secure protocol to account for correlations among patients within sites through generalized estimating equations, and to accommodate other link functions by extending it to generalized linear models.

Keywords

This publication has 91 references indexed in Scilit:

Prevalence of Unplanned Hospitalizations Caused by Adverse Drug Reactions in Older Veterans
Journal of the American Geriatrics Society, 2011
Associations of disease activity and treatments with mortality in men with rheumatoid arthritis: results from the VARA registry
Rheumatology, 2010
DataSHIELD: resolving a conflict in contemporary bioscience--performing a pooled analysis of individual-level data without sharing the data
International Journal of Epidemiology, 2010
Predictors of alternative antidepressant agent initiation among U. S. veterans diagnosed with depression
Pharmacoepidemiology and Drug Safety, 2010
Privacy-Maintaining Propensity Score-Based Pooling of Multiple Databases Applied to a Study of Biologics
Medical Care, 2010
Unintended effects of statins in men and women in England and Wales: population based cohort study using the QResearch database
BMJ, 2010
The Missing Voice of Patients in Drug-Safety Reporting
The New England Journal of Medicine, 2010
Multivariate-adjusted pharmacoepidemiologic analyses of confidential information pooled from multiple health care utilization databases
Pharmacoepidemiology and Drug Safety, 2010
Under-reporting of infectious gastrointestinal illness in British Columbia, Canada: who is counted in provincial communicable disease statistics?
Epidemiology and Infection, 2007
Adverse drug events occurring following hospital discharge
Journal of General Internal Medicine, 2005

Cited by 55 articles