TY - JOUR
T1 - Learning to hash for indexing big data - A survey
AU - Wang, Jun
AU - Liu, Wei
AU - Kumar, Sanjiv
AU - Chang, Shih Fu
N1 - Publisher Copyright:
© 1963-2012 IEEE.
PY - 2016/1
Y1 - 2016/1
N2 - The explosive growth in Big Data has attracted much attention in designing efficient indexing and search methods recently. In many critical applications such as large-scale search and pattern matching, finding the nearest neighbors to a query is a fundamental research problem. However, the straightforward solution using exhaustive comparison is infeasible due to the prohibitive computational complexity and memory requirement. In response, approximate nearest neighbor (ANN) search based on hashing techniques has become popular due to its promising performance in both efficiency and accuracy. Prior randomized hashing methods, e.g., locality-sensitive hashing (LSH), explore data-independent hash functions with random projections or permutations. Although having elegant theoretic guarantees on the search quality in certain metric spaces, performance of randomized hashing has been shown insufficient in many real-world applications. As a remedy, new approaches incorporating data-driven learning methods in development of advanced hash functions have emerged. Such learning-to-hash methods exploit information such as data distributions or class labels when optimizing the hash codes or functions. Importantly, the learned hash codes are able to preserve the proximity of neighboring data in the original feature spaces in the hash code spaces. The goal of this paper is to provide readers with systematic understanding of insights, pros, and cons of the emerging techniques. We provide a comprehensive survey of the learning-to-hash framework and representative techniques of various types, including unsupervised, semisupervised, and supervised. In addition, we also summarize recent hashing approaches utilizing the deep learning models. Finally, we discuss the future direction and trends of research in this area.
AB - The explosive growth in Big Data has attracted much attention in designing efficient indexing and search methods recently. In many critical applications such as large-scale search and pattern matching, finding the nearest neighbors to a query is a fundamental research problem. However, the straightforward solution using exhaustive comparison is infeasible due to the prohibitive computational complexity and memory requirement. In response, approximate nearest neighbor (ANN) search based on hashing techniques has become popular due to its promising performance in both efficiency and accuracy. Prior randomized hashing methods, e.g., locality-sensitive hashing (LSH), explore data-independent hash functions with random projections or permutations. Although having elegant theoretic guarantees on the search quality in certain metric spaces, performance of randomized hashing has been shown insufficient in many real-world applications. As a remedy, new approaches incorporating data-driven learning methods in development of advanced hash functions have emerged. Such learning-to-hash methods exploit information such as data distributions or class labels when optimizing the hash codes or functions. Importantly, the learned hash codes are able to preserve the proximity of neighboring data in the original feature spaces in the hash code spaces. The goal of this paper is to provide readers with systematic understanding of insights, pros, and cons of the emerging techniques. We provide a comprehensive survey of the learning-to-hash framework and representative techniques of various types, including unsupervised, semisupervised, and supervised. In addition, we also summarize recent hashing approaches utilizing the deep learning models. Finally, we discuss the future direction and trends of research in this area.
KW - Artificial neural networks
KW - Big data
KW - Binary codes
KW - Indexing
KW - Semantics
KW - Tagging
KW - Twitter
UR - https://www.scopus.com/pages/publications/84962259785
U2 - 10.1109/JPROC.2015.2487976
DO - 10.1109/JPROC.2015.2487976
M3 - 文献综述
AN - SCOPUS:84962259785
SN - 0018-9219
VL - 104
SP - 34
EP - 57
JO - Proceedings of the IEEE
JF - Proceedings of the IEEE
IS - 1
M1 - 7360966
ER -