Similar pair identification using locality-sensitive hashing technique

Abstract
Huge volumes of data pose many opportunities and challenges in business and information societies. The similar pair identification problem happens in various fields such as image retrieval, near-duplicate document identification, plagiarism analysis, entity resolution, and so on. With the increasing number of items, it is not efficient to make pair-wise similarity comparisons. To handle this problem in an efficient way, various techniques have been developed. The locality-sensitive hashing is one of such techniques to avoid pair-wise comparisons in avoiding similar pairs. This paper introduces a modified method of the projection-based locality sensitive hashing technique. The proposed method reduces the chances that similar pairs fall into different buckets which is one of major drawbacks in the projection-based technique. We have observed that the proposed method outperforms the conventional projection-based method in that it gets better recall rate with some additional memory and computation costs.
Keywords

This publication has 5 references indexed in Scilit: