A secure distributed logistic regression protocol for the detection of rare adverse drug events
Open Access
- 7 August 2012
- journal article
- Published by Oxford University Press (OUP) in Journal of the American Medical Informatics Association
- Vol. 20 (3), 453-461
- https://doi.org/10.1136/amiajnl-2011-000735
Abstract
Background There is limited capacity to assess the comparative risks of medications after they enter the market. For rare adverse events, the pooling of data from multiple sources is necessary to have the power and sufficient population heterogeneity to detect differences in safety and effectiveness in genetic, ethnic and clinically defined subpopulations. However, combining datasets from different data custodians or jurisdictions to perform an analysis on the pooled data creates significant privacy concerns that would need to be addressed. Existing protocols for addressing these concerns can result in reduced analysis accuracy and can allow sensitive information to leak. Objective To develop a secure distributed multi-party computation protocol for logistic regression that provides strong privacy guarantees. Methods We developed a secure distributed logistic regression protocol using a single analysis center with multiple sites providing data. A theoretical security analysis demonstrates that the protocol is robust to plausible collusion attacks and does not allow the parties to gain new information from the data that are exchanged among them. The computational performance and accuracy of the protocol were evaluated on simulated datasets. Results The computational performance scales linearly as the dataset sizes increase. The addition of sites results in an exponential growth in computation time. However, for up to five sites, the time is still short and would not affect practical applications. The model parameters are the same as the results on pooled raw data analyzed in SAS, demonstrating high model accuracy. Conclusion The proposed protocol and prototype system would allow the development of logistic regression models in a secure manner without requiring the sharing of personal health information. This can alleviate one of the key barriers to the establishment of large-scale post-marketing surveillance programs. We extended the secure protocol to account for correlations among patients within sites through generalized estimating equations, and to accommodate other link functions by extending it to generalized linear models.Keywords
This publication has 91 references indexed in Scilit:
- Prevalence of Unplanned Hospitalizations Caused by Adverse Drug Reactions in Older VeteransJournal of the American Geriatrics Society, 2011
- Associations of disease activity and treatments with mortality in men with rheumatoid arthritis: results from the VARA registryRheumatology, 2010
- DataSHIELD: resolving a conflict in contemporary bioscience--performing a pooled analysis of individual-level data without sharing the dataInternational Journal of Epidemiology, 2010
- Predictors of alternative antidepressant agent initiation among U. S. veterans diagnosed with depressionPharmacoepidemiology and Drug Safety, 2010
- Privacy-Maintaining Propensity Score-Based Pooling of Multiple Databases Applied to a Study of BiologicsMedical Care, 2010
- Unintended effects of statins in men and women in England and Wales: population based cohort study using the QResearch databaseBMJ, 2010
- The Missing Voice of Patients in Drug-Safety ReportingThe New England Journal of Medicine, 2010
- Multivariate-adjusted pharmacoepidemiologic analyses of confidential information pooled from multiple health care utilization databasesPharmacoepidemiology and Drug Safety, 2010
- Under-reporting of infectious gastrointestinal illness in British Columbia, Canada: who is counted in provincial communicable disease statistics?Epidemiology and Infection, 2007
- Adverse drug events occurring following hospital dischargeJournal of General Internal Medicine, 2005