TY - GEN
T1 - Cleaning uncertain data for top-k queries
AU - Mo, Luyi
AU - Cheng, Reynold
AU - Li, Xiang
AU - Cheung, David W.
AU - Yang, Xuan S.
PY - 2013
Y1 - 2013
N2 - The information managed in emerging applications, such as sensor networks, location-based services, and data integration, is inherently imprecise. To handle data uncertainty, probabilistic databases have been recently developed. In this paper, we study how to quantify the ambiguity of answers returned by a probabilistic top-k query. We develop efficient algorithms to compute the quality of this query under the possible world semantics. We further address the cleaning of a probabilistic database, in order to improve top-k query quality. Cleaning involves the reduction of ambiguity associated with the database entities. For example, the uncertainty of a temperature value acquired from a sensor can be reduced, or cleaned, by requesting its newest value from the sensor. While this "cleaning operation" may produce a better query result, it may involve a cost and fail. We investigate the problem of selecting entities to be cleaned under a limited budget. Particularly, we propose an optimal solution and several heuristics. Experiments show that the greedy algorithm is efficient and close to optimal.
AB - The information managed in emerging applications, such as sensor networks, location-based services, and data integration, is inherently imprecise. To handle data uncertainty, probabilistic databases have been recently developed. In this paper, we study how to quantify the ambiguity of answers returned by a probabilistic top-k query. We develop efficient algorithms to compute the quality of this query under the possible world semantics. We further address the cleaning of a probabilistic database, in order to improve top-k query quality. Cleaning involves the reduction of ambiguity associated with the database entities. For example, the uncertainty of a temperature value acquired from a sensor can be reduced, or cleaned, by requesting its newest value from the sensor. While this "cleaning operation" may produce a better query result, it may involve a cost and fail. We investigate the problem of selecting entities to be cleaned under a limited budget. Particularly, we propose an optimal solution and several heuristics. Experiments show that the greedy algorithm is efficient and close to optimal.
UR - https://www.scopus.com/pages/publications/84881328468
U2 - 10.1109/ICDE.2013.6544820
DO - 10.1109/ICDE.2013.6544820
M3 - 会议稿件
AN - SCOPUS:84881328468
SN - 9781467349086
T3 - Proceedings - International Conference on Data Engineering
SP - 134
EP - 145
BT - ICDE 2013 - 29th International Conference on Data Engineering
T2 - 29th International Conference on Data Engineering, ICDE 2013
Y2 - 8 April 2013 through 11 April 2013
ER -