Towards Making Systems Forget with Machine Unlearning
- 1 May 2015
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- No. 10816011,p. 463-480
- https://doi.org/10.1109/sp.2015.35
Abstract
Today's systems produce a rapidly exploding amount of data, and the data further derives more data, forming a complex data propagation network that we call the data's lineage. There are many reasons that users want systems to forget certain data including its lineage. From a privacy perspective, users who become concerned with new privacy risks of a system often want the system to forget their data and lineage. From a security perspective, if an attacker pollutes an anomaly detector by injecting manually crafted data into the training data set, the detector must forget the injected data to regain security. From a usability perspective, a user can remove noise and incorrect entries so that a recommendation engine gives useful recommendations. Therefore, we envision forgetting systems, capable of forgetting certain data and their lineages, completely and quickly. This paper focuses on making learning systems forget, the process of which we call machine unlearning, or simply unlearning. We present a general, efficient unlearning approach by transforming learning algorithms used by a system into a summation form. To forget a training data sample, our approach simply updates a small number of summations -- asymptotically faster than retraining from scratch. Our approach is general, because the summation form is from the statistical query learning in which many machine learning algorithms can be implemented. Our approach also applies to all stages of machine learning, including feature selection and modeling. Our evaluation, on four diverse learning systems and real-world workloads, shows that our approach is general, effective, fast, and easy to use.Keywords
This publication has 34 references indexed in Scilit:
- Adversarial stylometryACM Transactions on Information and System Security, 2012
- The security of machine learningMachine Learning, 2010
- Learning to classify with missing and corrupted featuresMachine Learning, 2009
- Incremental and Decremental Learning for Linear Support Vector MachinesLecture Notes in Computer Science, 2007
- Support Vector Data DescriptionMachine Learning, 2004
- Item-based top-Nrecommendation algorithmsACM Transactions on Information Systems, 2004
- Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learningNature Medicine, 2002
- Efficient noise-tolerant learning from statistical queriesJournal of the ACM, 1998
- Syntactic clustering of the WebComputer Networks and ISDN Systems, 1997
- Induction of decision treesMachine Learning, 1986