Differentially Private High-Dimensional Data Publication via Sampling-Based Inference

10 August 2015

conference paper
conference paper
Published by Association for Computing Machinery (ACM)

https://doi.org/10.1145/2783258.2783379

Abstract

Releasing high-dimensional data enables a wide spectrum of data mining tasks. Yet, individual privacy has been a major obstacle to data sharing. In this paper, we consider the problem of releasing high-dimensional data with differential privacy guarantees. We propose a novel solution to preserve the joint distribution of a high-dimensional dataset. We first develop a robust sampling-based framework to systematically explore the dependencies among all attributes and subsequently build a dependency graph. This framework is coupled with a generic threshold mechanism to significantly improve accuracy. We then identify a set of marginal tables from the dependency graph to approximate the joint distribution based on the solid inference foundation of the junction tree algorithm while minimizing the resultant error. We prove that selecting the optimal marginals with the goal of minimizing error is NP-hard and, thus, design an approximation algorithm using an integer programming relaxation and the constrained concave-convex procedure. Extensive experiments on real datasets demonstrate that our solution substantially outperforms the state-of-the-art competitors.

Keywords

Funding Information

National Natural Science Foundation of China (61305071)
Research Grants Council, University Grants Committee, Hong Kong (GRF/HKBU211512, GRF/HKBU12200114)

This publication has 24 references indexed in Scilit:

Differentially Private Histogram Publishing through Lossy Compression
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2012
Differentially private summaries for sparse data
Published by Association for Computing Machinery (ACM) ,2012
Differentially private data cubes
Published by Association for Computing Machinery (ACM) ,2011
A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2010
Boosting the accuracy of differentially private histograms through consistency
Proceedings of the VLDB Endowment, 2010
Statistical Meta‐Analysis with Applications
Wiley Series in Probability and Statistics, 2008
Privacy, accuracy, and consistency too
Published by Association for Computing Machinery (ACM) ,2007
Calibrating Noise to Sensitivity in Private Data Analysis
Lecture Notes in Computer Science, 2006
CORDS
Published by Association for Computing Machinery (ACM) ,2004
The posterior probability of Bayes nets with strong dependences
Soft Computing, 1999

Cited by 120 articles