Similar pair identification using locality-sensitive hashing technique
- 1 November 2012
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE) in The 6th International Conference on Soft Computing and Intelligent Systems, and The 13th International Symposium on Advanced Intelligence Systems
- p. 2117-2119
- https://doi.org/10.1109/scis-isis.2012.6505385
Abstract
Huge volumes of data pose many opportunities and challenges in business and information societies. The similar pair identification problem happens in various fields such as image retrieval, near-duplicate document identification, plagiarism analysis, entity resolution, and so on. With the increasing number of items, it is not efficient to make pair-wise similarity comparisons. To handle this problem in an efficient way, various techniques have been developed. The locality-sensitive hashing is one of such techniques to avoid pair-wise comparisons in avoiding similar pairs. This paper introduces a modified method of the projection-based locality sensitive hashing technique. The proposed method reduces the chances that similar pairs fall into different buckets which is one of major drawbacks in the projection-based technique. We have observed that the proposed method outperforms the conventional projection-based method in that it gets better recall rate with some additional memory and computation costs.Keywords
This publication has 5 references indexed in Scilit:
- Near-optimal hashing algorithms for approximate nearest neighbor in high dimensionsCommunications of the ACM, 2008
- Locality-sensitive hashing scheme based on p-stable distributionsPublished by Association for Computing Machinery (ACM) ,2004
- Fast pose estimation with parameter-sensitive hashingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Similarity estimation techniques from rounding algorithmsPublished by Association for Computing Machinery (ACM) ,2002
- Worst-case analysis for region and partial region searches in multidimensional binary search trees and balanced quad treesActa Informatica, 1977