Cleaning uncertain data for top-k queries

  • Luyi Mo
  • , Reynold Cheng
  • , Xiang Li
  • , David W. Cheung
  • , Xuan S. Yang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

28 Scopus citations

Abstract

The information managed in emerging applications, such as sensor networks, location-based services, and data integration, is inherently imprecise. To handle data uncertainty, probabilistic databases have been recently developed. In this paper, we study how to quantify the ambiguity of answers returned by a probabilistic top-k query. We develop efficient algorithms to compute the quality of this query under the possible world semantics. We further address the cleaning of a probabilistic database, in order to improve top-k query quality. Cleaning involves the reduction of ambiguity associated with the database entities. For example, the uncertainty of a temperature value acquired from a sensor can be reduced, or cleaned, by requesting its newest value from the sensor. While this "cleaning operation" may produce a better query result, it may involve a cost and fail. We investigate the problem of selecting entities to be cleaned under a limited budget. Particularly, we propose an optimal solution and several heuristics. Experiments show that the greedy algorithm is efficient and close to optimal.

Original languageEnglish
Title of host publicationICDE 2013 - 29th International Conference on Data Engineering
Pages134-145
Number of pages12
DOIs
StatePublished - 2013
Externally publishedYes
Event29th International Conference on Data Engineering, ICDE 2013 - Brisbane, QLD, Australia
Duration: 8 Apr 201311 Apr 2013

Publication series

NameProceedings - International Conference on Data Engineering
ISSN (Print)1084-4627

Conference

Conference29th International Conference on Data Engineering, ICDE 2013
Country/TerritoryAustralia
CityBrisbane, QLD
Period8/04/1311/04/13

Fingerprint

Dive into the research topics of 'Cleaning uncertain data for top-k queries'. Together they form a unique fingerprint.

Cite this