Privacy-Preserving Deep Learning
Top Cited Papers
- 12 October 2015
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM)
Abstract
Deep learning based on artificial neural networks is a very popular approach to modeling, classifying, and recognizing complex data such as images, speech, and text. The unprecedented accuracy of deep learning methods has turned them into the foundation of new AI-based services on the Internet. Commercial companies that collect user data on a large scale have been the main beneficiaries of this trend since the success of deep learning techniques is directly proportional to the amount of data available for training. Massive data collection required for deep learning presents obvious privacy issues. Users' personal, highly sensitive data such as photos and voice recordings is kept indefinitely by the companies that collect it. Users can neither delete it, nor restrict the purposes for which it is used. Furthermore, centrally kept data is subject to legal subpoenas and extra-judicial surveillance. Many data owners--for example, medical institutions that may want to apply deep learning methods to clinical records--are prevented by privacy and confidentiality concerns from sharing the data and thus benefitting from large-scale deep learning. In this paper, we design, implement, and evaluate a practical system that enables multiple parties to jointly learn an accurate neural-network model for a given objective without sharing their input datasets. We exploit the fact that the optimization algorithms used in modern deep learning, namely, those based on stochastic gradient descent, can be parallelized and executed asynchronously. Our system lets participants train independently on their own datasets and selectively share small subsets of their models' key parameters during training. This offers an attractive point in the utility/privacy tradeoff space: participants preserve the privacy of their respective data while still benefitting from other participants' models and thus boosting their learning accuracy beyond what is achievable solely on their own inputs. We demonstrate the accuracy of our privacy-preserving deep learning on benchmark datasets.Keywords
Funding Information
- National Institutes of Health (LM011028-01)
- National Science Foundation (1223396 and 1408944)
- Schweizerische Nationalfonds zur Förderung der Wissenschaftlichen Forschung
This publication has 33 references indexed in Scilit:
- Private predictive analysis on encrypted medical dataJournal of Biomedical Informatics, 2014
- A Deep Learning Architecture for Image Representation, Visual Interpretability and Automated Basal-Cell Carcinoma Cancer DetectionLecture Notes in Computer Science, 2013
- Differential PrivacyPublished by Springer Science and Business Media LLC ,2011
- Learning Deep Architectures for AIFoundations and Trends® in Machine Learning, 2009
- Privacy-preserving Naïve Bayes classificationThe VLDB Journal, 2007
- Rank, Trace-Norm and Max-NormLecture Notes in Computer Science, 2005
- Privacy-Preserving Multivariate Statistical Analysis: Linear Regression and ClassificationPublished by Society for Industrial & Applied Mathematics (SIAM) ,2004
- Privacy Preserving Data MiningLecture Notes in Computer Science, 2000
- Gradient-based learning applied to document recognitionProceedings of the IEEE, 1998
- Learning Internal Representations by Error PropagationPublished by Elsevier BV ,1988