跳到主要导航 跳到搜索 跳到主要内容

GFilter: A General Gram Filter for String Similarity Search

  • East China Normal University
  • University of Queensland

科研成果: 期刊稿件文章同行评审

摘要

Numerous applications such as data integration, protein detection, and article copy detection share a similar core problem: given a string as the query, how to efficiently find all the similar answers from a large scale string collection. Many existing methods adopt a prefix-filter-based framework to solve this problem, and a number of recent works aim to use advanced filters to improve the overall search performance. In this paper, we propose a gram-based framework to achieve near maximum filter performance. The main idea is to judiciously choose the high-quality grams as the prefix of query according to their estimated ability to filter candidates. As this selection process is proved to be NP-hard problem, we give a cost model to measure the filter ability of grams and develop efficient heuristic algorithms to find high-quality grams. Extensive experiments on real datasets demonstrate the superiority of the proposed framework in comparison with the state-of-art approaches.

源语言英语
文章编号6880793
页(从-至)1005-1018
页数14
期刊IEEE Transactions on Knowledge and Data Engineering
27
4
DOI
出版状态已出版 - 1 4月 2015

指纹

探究 'GFilter: A General Gram Filter for String Similarity Search' 的科研主题。它们共同构成独一无二的指纹。

引用此