跳到主要导航 跳到搜索 跳到主要内容

Computing probability threshold set similarity on probabilistic sets

  • East China Normal University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Currently, the computation of set similarity has become an increasingly important tool in many real-world applications, such as near-duplicate detection, data cleaning and record linkage, etc., in which sets often are uncertain due to date missing, imprecise and noise, etc. The challenge of evaluating similarity between probabilistic sets mainly stems from the exponential blowup in the number of possible worlds induced by uncertainty. In this paper, we define the probability threshold set similarity (PTSS) between two probabilistic sets based on the possible world semantics and propose an exact solution to compute PTSS via the dynamic programming. To speed up the computation of the probability threshold set query (PTSQ), we derive an efficient and effective pruning rule for PTSQ. Finally, we conduct extensive experiments to verify the effectiveness and efficiency of our algorithms using both real and synthetic datasets.

源语言英语
主期刊名Web-Age Information Management - 16th International Conference, WAIM 2015, Proceedings
编辑Yizhou Sun, Jian Li
出版商Springer Verlag
374-386
页数13
ISBN(电子版)9783319210414
DOI
出版状态已出版 - 2015
活动16th International Conference on Web-Age Information Management, WAIM 2015 - Qingdao, 中国
期限: 8 6月 201510 6月 2015

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
9098
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议16th International Conference on Web-Age Information Management, WAIM 2015
国家/地区中国
Qingdao
时期8/06/1510/06/15

指纹

探究 'Computing probability threshold set similarity on probabilistic sets' 的科研主题。它们共同构成独一无二的指纹。

引用此