Towards Making Systems Forget with Machine Unlearning

1 May 2015

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

No. 10816011,p. 463-480
https://doi.org/10.1109/sp.2015.35

Abstract

Today's systems produce a rapidly exploding amount of data, and the data further derives more data, forming a complex data propagation network that we call the data's lineage. There are many reasons that users want systems to forget certain data including its lineage. From a privacy perspective, users who become concerned with new privacy risks of a system often want the system to forget their data and lineage. From a security perspective, if an attacker pollutes an anomaly detector by injecting manually crafted data into the training data set, the detector must forget the injected data to regain security. From a usability perspective, a user can remove noise and incorrect entries so that a recommendation engine gives useful recommendations. Therefore, we envision forgetting systems, capable of forgetting certain data and their lineages, completely and quickly. This paper focuses on making learning systems forget, the process of which we call machine unlearning, or simply unlearning. We present a general, efficient unlearning approach by transforming learning algorithms used by a system into a summation form. To forget a training data sample, our approach simply updates a small number of summations -- asymptotically faster than retraining from scratch. Our approach is general, because the summation form is from the statistical query learning in which many machine learning algorithms can be implemented. Our approach also applies to all stages of machine learning, including feature selection and modeling. Our evaluation, on four diverse learning systems and real-world workloads, shows that our approach is general, effective, fast, and easy to use.

Keywords

This publication has 34 references indexed in Scilit:

Adversarial stylometry
ACM Transactions on Information and System Security, 2012
The security of machine learning
Machine Learning, 2010
Learning to classify with missing and corrupted features
Machine Learning, 2009
Incremental and Decremental Learning for Linear Support Vector Machines
Lecture Notes in Computer Science, 2007
Support Vector Data Description
Machine Learning, 2004
Item-based top-Nrecommendation algorithms
ACM Transactions on Information Systems, 2004
Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning
Nature Medicine, 2002
Efficient noise-tolerant learning from statistical queries
Journal of the ACM, 1998
Syntactic clustering of the Web
Computer Networks and ISDN Systems, 1997
Induction of decision trees
Machine Learning, 1986

Cited by 148 articles