跳到主要导航 跳到搜索 跳到主要内容

Efficient locality-sensitive hashing over high-dimensional streaming data

  • Hao Wang
  • , Chengcheng Yang*
  • , Xiangliang Zhang
  • , Xin Gao
  • *此作品的通讯作者
  • King Abdullah University of Science and Technology
  • Shenzhen University

科研成果: 期刊稿件文章同行评审

摘要

Approximate nearest neighbor (ANN) search in high-dimensional spaces is fundamental in many applications. Locality-sensitive hashing (LSH) is a well-known methodology to solve the ANN problem. Existing LSH-based ANN solutions typically employ a large number of individual indexes optimized for searching efficiency. Updating such indexes might be impractical when processing high-dimensional streaming data. In this paper, we present a novel disk-based LSH index that offers efficient support for both searches and updates. The contributions of our work are threefold. First, we use the write-friendly LSM-trees to store the LSH projections to facilitate efficient updates. Second, we develop a novel estimation scheme to estimate the number of required LSH functions, with which the disk storage and access costs are effectively reduced. Third, we exploit both the collision number and the projection distance to improve the efficiency of candidate selection, improving the search performance with theoretical guarantees on the result quality. Experiments on four real-world datasets show that our proposal outperforms the state-of-the-art schemes.

源语言英语
页(从-至)3753-3766
页数14
期刊Neural Computing and Applications
35
5
DOI
出版状态已出版 - 2月 2023
已对外发布

指纹

探究 'Efficient locality-sensitive hashing over high-dimensional streaming data' 的科研主题。它们共同构成独一无二的指纹。

引用此