C-approximate nearest neighbor query algorithm based on learning for high-dimensional data

Research output: Contribution to journalArticlepeer-review

7 Scopus citations

Abstract

Under the filter-and-refine framework and based on the learning techniques, a data-aware method for c-approximate nearest neighbor query for high-dimensional data is proposed in this paper. The study claims that data after random projection satisfies the entropy-maximizing criterion which is needed by the semantic hashing. The binary codes after random projection are treated as the labels, and a group of classifiers are trained, which are used for predicting the binary code for the query. The data objects are selected who's Hamming distances between the query satisfying the threshold as the candidates. The real distances are evaluated on the candidate subset and the smallest one is returned. Experimental results on the synthetic datasets and the real datasets show that this method outperforms the existing work with shorter binary code, in addition, the performance and the result quality can be easily tuned.

Original languageEnglish
Pages (from-to)2018-2031
Number of pages14
JournalRuan Jian Xue Bao/Journal of Software
Volume23
Issue number8
DOIs
StatePublished - Aug 2012

Keywords

  • Entropy maximizing criterion
  • High-dimensional data
  • Locality sensitive hashing
  • Random projection
  • SVM classifier
  • c-approximate nearest neighbor query

Fingerprint

Dive into the research topics of 'C-approximate nearest neighbor query algorithm based on learning for high-dimensional data'. Together they form a unique fingerprint.

Cite this