Privacy-Preserving Deep Learning

Top Cited Papers

12 October 2015

conference paper
conference paper
Published by Association for Computing Machinery (ACM)

https://doi.org/10.1145/2810103.2813687

Abstract

Deep learning based on artificial neural networks is a very popular approach to modeling, classifying, and recognizing complex data such as images, speech, and text. The unprecedented accuracy of deep learning methods has turned them into the foundation of new AI-based services on the Internet. Commercial companies that collect user data on a large scale have been the main beneficiaries of this trend since the success of deep learning techniques is directly proportional to the amount of data available for training. Massive data collection required for deep learning presents obvious privacy issues. Users' personal, highly sensitive data such as photos and voice recordings is kept indefinitely by the companies that collect it. Users can neither delete it, nor restrict the purposes for which it is used. Furthermore, centrally kept data is subject to legal subpoenas and extra-judicial surveillance. Many data owners--for example, medical institutions that may want to apply deep learning methods to clinical records--are prevented by privacy and confidentiality concerns from sharing the data and thus benefitting from large-scale deep learning. In this paper, we design, implement, and evaluate a practical system that enables multiple parties to jointly learn an accurate neural-network model for a given objective without sharing their input datasets. We exploit the fact that the optimization algorithms used in modern deep learning, namely, those based on stochastic gradient descent, can be parallelized and executed asynchronously. Our system lets participants train independently on their own datasets and selectively share small subsets of their models' key parameters during training. This offers an attractive point in the utility/privacy tradeoff space: participants preserve the privacy of their respective data while still benefitting from other participants' models and thus boosting their learning accuracy beyond what is achievable solely on their own inputs. We demonstrate the accuracy of our privacy-preserving deep learning on benchmark datasets.

Keywords

Funding Information

National Institutes of Health (LM011028-01)
National Science Foundation (1223396 and 1408944)
Schweizerische Nationalfonds zur Förderung der Wissenschaftlichen Forschung

This publication has 33 references indexed in Scilit:

Private predictive analysis on encrypted medical data
Journal of Biomedical Informatics, 2014
A Deep Learning Architecture for Image Representation, Visual Interpretability and Automated Basal-Cell Carcinoma Cancer Detection
Lecture Notes in Computer Science, 2013
Differential Privacy
Published by Springer Science and Business Media LLC ,2011
Learning Deep Architectures for AI
Foundations and Trends® in Machine Learning, 2009
Privacy-preserving Naïve Bayes classification
The VLDB Journal, 2007
Rank, Trace-Norm and Max-Norm
Lecture Notes in Computer Science, 2005
Privacy-Preserving Multivariate Statistical Analysis: Linear Regression and Classification
Published by Society for Industrial & Applied Mathematics (SIAM) ,2004
Privacy Preserving Data Mining
Lecture Notes in Computer Science, 2000
Gradient-based learning applied to document recognition
Proceedings of the IEEE, 1998
Learning Internal Representations by Error Propagation
Published by Elsevier BV ,1988

Cited by 987 articles