A Scalable Two-Phase Top-Down Specialization Approach for Data Anonymization Using MapReduce on Cloud
Top Cited Papers
- 25 February 2013
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Parallel and Distributed Systems
- Vol. 25 (2), 363-373
- https://doi.org/10.1109/tpds.2013.48
Abstract
A large number of cloud services require users to share private data like electronic health records for data analysis or mining, bringing privacy concerns. Anonymizing data sets via generalization to satisfy certain privacy requirements such as k-anonymity is a widely used category of privacy preserving techniques. At present, the scale of data in many cloud applications increases tremendously in accordance with the Big Data trend, thereby making it a challenge for commonly used software tools to capture, manage, and process such large-scale data within a tolerable elapsed time. As a result, it is a challenge for existing anonymization approaches to achieve privacy preservation on privacy-sensitive large-scale data sets due to their insufficiency of scalability. In this paper, we propose a scalable two-phase top-down specialization (TDS) approach to anonymize large-scale data sets using the MapReduce framework on cloud. In both phases of our approach, we deliberately design a group of innovative MapReduce jobs to concretely accomplish the specialization computation in a highly scalable way. Experimental evaluation results demonstrate that with our approach, the scalability and efficiency of TDS can be significantly improved over existing approaches.Keywords
This publication has 24 references indexed in Scilit:
- Inside "Big Data management"Published by Association for Computing Machinery (ACM) ,2012
- The HaLoop approach to large-scale iterative data analysisThe VLDB Journal, 2012
- Anonymity meets game theory: secure data integration with malicious participantsThe VLDB Journal, 2010
- Privacy-preserving data publishingACM Computing Surveys, 2010
- TwisterPublished by Association for Computing Machinery (ACM) ,2010
- Workload-aware anonymization techniques for large-scale datasetsACM Transactions on Database Systems, 2008
- MapReduceCommunications of the ACM, 2008
- Anonymizing Classification Data for Privacy PreservationIEEE Transactions on Knowledge and Data Engineering, 2007
- K-Anonymization as Spatial Indexing: Toward Scalable and Incremental AnonymizationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2007
- Mondrian Multidimensional K-AnonymityPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2006