TY - GEN
T1 - Computing probability threshold set similarity on probabilistic sets
AU - Wang, Lei
AU - Gao, Ming
AU - Zhang, Rong
AU - Jin, Cheqing
AU - Zhou, Aoying
N1 - Publisher Copyright:
© Springer International Publishing Switzerland 2015.
PY - 2015
Y1 - 2015
N2 - Currently, the computation of set similarity has become an increasingly important tool in many real-world applications, such as near-duplicate detection, data cleaning and record linkage, etc., in which sets often are uncertain due to date missing, imprecise and noise, etc. The challenge of evaluating similarity between probabilistic sets mainly stems from the exponential blowup in the number of possible worlds induced by uncertainty. In this paper, we define the probability threshold set similarity (PTSS) between two probabilistic sets based on the possible world semantics and propose an exact solution to compute PTSS via the dynamic programming. To speed up the computation of the probability threshold set query (PTSQ), we derive an efficient and effective pruning rule for PTSQ. Finally, we conduct extensive experiments to verify the effectiveness and efficiency of our algorithms using both real and synthetic datasets.
AB - Currently, the computation of set similarity has become an increasingly important tool in many real-world applications, such as near-duplicate detection, data cleaning and record linkage, etc., in which sets often are uncertain due to date missing, imprecise and noise, etc. The challenge of evaluating similarity between probabilistic sets mainly stems from the exponential blowup in the number of possible worlds induced by uncertainty. In this paper, we define the probability threshold set similarity (PTSS) between two probabilistic sets based on the possible world semantics and propose an exact solution to compute PTSS via the dynamic programming. To speed up the computation of the probability threshold set query (PTSQ), we derive an efficient and effective pruning rule for PTSQ. Finally, we conduct extensive experiments to verify the effectiveness and efficiency of our algorithms using both real and synthetic datasets.
UR - https://www.scopus.com/pages/publications/84937440391
U2 - 10.1007/978-3-319-21042-1_30
DO - 10.1007/978-3-319-21042-1_30
M3 - 会议稿件
AN - SCOPUS:84937440391
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 374
EP - 386
BT - Web-Age Information Management - 16th International Conference, WAIM 2015, Proceedings
A2 - Sun, Yizhou
A2 - Li, Jian
PB - Springer Verlag
T2 - 16th International Conference on Web-Age Information Management, WAIM 2015
Y2 - 8 June 2015 through 10 June 2015
ER -