Scalable Similarity Search With Topology Preserving Hashing

Abstract
Hashing-based similarity search techniques is becoming increasingly popular in large data sets. To capture meaningful neighbors, the topology of a data set, which represents the neighborhood relationships between its subregions and the relative proximities between the neighbors of each subregion, e.g., the relative neighborhood ranking of each subregion, should be exploited. However, most existing hashing methods are developed to preserve neighborhood relationships while ignoring the relative neighborhood proximities. Moreover, most hashing methods lack in providing a good result ranking, since there are often lots of results sharing the same Hamming distance to a query. In this paper, we propose a novel hashing method to solve these two issues jointly. The proposed method is referred to as topology preserving hashing (TPH). TPH is distinct from prior works by also preserving the neighborhood ranking. Based on this framework, we present three different TPH methods, including linear unsupervised TPH, semisupervised TPH, and kernelized TPH. Particularly, our unsupervised TPH is capable of mining semantic relationship between unlabeled data without supervised information. Extensive experiments on four large data sets demonstrate the superior performances of the proposed methods over several state-of-the-art unsupervised and semisupervised hashing techniques.
Funding Information
  • National High Technology Research and Development Program of China (2014AA015202)
  • National Natural Science Foundation of China (61303151, 61273247, 61271428)
  • National Key Technology Research and Development Program of China (2012BAH39B02)
  • National Science Foundation of China (61128007)
  • Army Research Office (W911NF-12-1-0057)
  • Faculty Research Awards through the NEC Laboratories America Inc., Princeton, NJ, USA
  • 2012 UTSA START-R Research Award
  • National Nature Science Foundation of China.

This publication has 32 references indexed in Scilit: