Computing probability threshold set similarity on probabilistic sets

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Currently, the computation of set similarity has become an increasingly important tool in many real-world applications, such as near-duplicate detection, data cleaning and record linkage, etc., in which sets often are uncertain due to date missing, imprecise and noise, etc. The challenge of evaluating similarity between probabilistic sets mainly stems from the exponential blowup in the number of possible worlds induced by uncertainty. In this paper, we define the probability threshold set similarity (PTSS) between two probabilistic sets based on the possible world semantics and propose an exact solution to compute PTSS via the dynamic programming. To speed up the computation of the probability threshold set query (PTSQ), we derive an efficient and effective pruning rule for PTSQ. Finally, we conduct extensive experiments to verify the effectiveness and efficiency of our algorithms using both real and synthetic datasets.

Original languageEnglish
Title of host publicationWeb-Age Information Management - 16th International Conference, WAIM 2015, Proceedings
EditorsYizhou Sun, Jian Li
PublisherSpringer Verlag
Pages374-386
Number of pages13
ISBN (Electronic)9783319210414
DOIs
StatePublished - 2015
Event16th International Conference on Web-Age Information Management, WAIM 2015 - Qingdao, China
Duration: 8 Jun 201510 Jun 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9098
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference16th International Conference on Web-Age Information Management, WAIM 2015
Country/TerritoryChina
CityQingdao
Period8/06/1510/06/15

Fingerprint

Dive into the research topics of 'Computing probability threshold set similarity on probabilistic sets'. Together they form a unique fingerprint.

Cite this