Using Anonymized Data for Classification
- 1 March 2009
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE) in International Conference on Data Engineering
- p. 429-440
- https://doi.org/10.1109/icde.2009.19
Abstract
In recent years, anonymization methods have emerged as an important tool to preserve individual privacy when releasing privacy sensitive data sets. This interest in anonymization techniques has resulted in a plethora of methods for anonymizing data under different privacy and utility assumptions. At the same time, there has been little research addressing how to effectively use the anonymized data for data mining in general and for distributed data mining in particular. In this paper, we propose a new approach for building classifiers using anonymized data by modeling anonymized data as uncertain data. In our method, we do not assume any probability distribution over the data. Instead, we propose collecting all necessary statistics during anonymization and releasing these together with the anonymized data. We show that releasing such statistics does not violate anonymity. Experiments spanning various alternatives both in local and distributed data mining settings reveal that our method performs better than heuristic approaches for handling anonymized data.Keywords
This publication has 20 references indexed in Scilit:
- On Unifying Privacy and Uncertain Data ModelsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2008
- Efficient Mining of Frequent Patterns from Uncertain DataPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2007
- Mining Frequent Itemsets from Uncertain DataPublished by Springer Science and Business Media LLC ,2007
- Aggregate Query Answering on Anonymized TablesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2007
- An Efficient Distance Calculation Method for Uncertain ObjectsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2007
- Efficient Clustering of Uncertain DataIEEE International Conference on Data Mining (ICDM), 2006
- Utility-based anonymization using local recodingPublished by Association for Computing Machinery (ACM) ,2006
- Workload-aware anonymizationPublished by Association for Computing Machinery (ACM) ,2006
- Injecting utility into anonymized datasetsPublished by Association for Computing Machinery (ACM) ,2006
- Hierarchical Density-Based Clustering of Uncertain DataPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2006