Private record matching using differential privacy
- 22 March 2010
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM)
- p. 123-134
- https://doi.org/10.1145/1739041.1739059
Abstract
Private matching between datasets owned by distinct parties is a challenging problem with several applications. Private matching allows two parties to identify the records that are close to each other according to some distance functions, such that no additional information other than the join result is disclosed to any party. Private matching can be solved securely and accurately using secure multi-party computation (SMC) techniques, but such an approach is prohibitively expensive in practice. Previous work proposed the release of sanitized versions of the sensitive datasets which allows blocking, i.e., filtering out sub-sets of records that cannot be part of the join result. This way, SMC is applied only to a small fraction of record pairs, reducing the matching cost to acceptable levels. The blocking step is essential for the privacy, accuracy and efficiency of matching. However, the state-of-the-art focuses on sanitization based on k-anonymity, which does not provide sufficient privacy. We propose an alternative design centered on differential privacy, a novel paradigm that provides strong privacy guarantees. The realization of the new model presents difficult challenges, such as the evaluation of distance-based matching conditions with the help of only a statistical queries interface. Specialized versions of data indexing structures (e.g., kd-trees) also need to be devised, in order to comply with differential privacy. Experiments conducted on the real-world Census-income dataset show that, although our methods provide strong privacy, their effectiveness in reducing matching cost is not far from that of k-anonymity based counterparts.Keywords
Funding Information
- Air Force Office of Scientific Research (FA9550-08-1-0265FA9550-07-1-0041)
- Division of Computer and Network Systems (Career-0845803CNS-0716424)
This publication has 19 references indexed in Scilit:
- Output perturbation with query relaxationProceedings of the VLDB Endowment, 2008
- Privacy preserving schema and data matchingPublished by Association for Computing Machinery (ACM) ,2007
- Differential PrivacyLecture Notes in Computer Science, 2006
- Calibrating Noise to Sensitivity in Private Data AnalysisLecture Notes in Computer Science, 2006
- Blocking-aware private record linkagePublished by Association for Computing Machinery (ACM) ,2005
- Privacy-Preserving Set OperationsLecture Notes in Computer Science, 2005
- Tools for privacy preserving distributed data miningACM SIGKDD Explorations Newsletter, 2002
- k-ANONYMITY: A MODEL FOR PROTECTING PRIVACYInternational Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2002
- Enhancing privacy and trust in electronic communitiesPublished by Association for Computing Machinery (ACM) ,1999
- The R*-tree: an efficient and robust access method for points and rectanglesACM SIGMOD Record, 1990