Tracing errors in probabilistic databases based on the Bayesian network

  • Liang Duan
  • , Kun Yue*
  • , Cheqing Jin
  • , Wenlin Xu
  • , Weiyi Liu
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

Data in probabilistic databases may not be absolutely correct, and worse, may be erroneous. Many existing data cleaning methods can be used to detect errors in traditional databases, but they fall short of guiding us to find errors in probabilistic databases, especially for databases with complex correlations among data. In this paper, we propose a method for tracing errors in probabilistic databases by adopting Bayesian network (BN) as the framework of representing the correlations among data. We first develop the techniques to construct an augmented Bayesian network (ABN) for an anomalous query to represent correlations among input data, intermediate data and output data in the query execution. Inspired by the notion of blame in causal models, we then define a notion of blame for ranking candidate errors. Next, we provide an efficient method for computing the degree of blame for each candidate error based on the probabilistic inference upon the ABN. Experimental results show the effectiveness and efficiency of our method.

Original languageEnglish
Title of host publicationDatabase Systems for Advanced Applications - 20th International Conference, DASFAA 2015, Hanoi, Vietnam, April 20-23, 2015 Proceedings, Part II
EditorsMuhammad Aamir Cheema, Matthias Renz, Cyrus Shahabi, Xiaofang Zhou
PublisherSpringer Verlag
Pages104-119
Number of pages16
ISBN (Print)9783319181226
DOIs
StatePublished - 2015
Event20th International Conference on Database Systems for Advanced Applications, DASFAA 2015 - Hanoi, Viet Nam
Duration: 20 Apr 201523 Apr 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9050
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference20th International Conference on Database Systems for Advanced Applications, DASFAA 2015
Country/TerritoryViet Nam
CityHanoi
Period20/04/1523/04/15

Keywords

  • Bayesian network
  • Data cleaning
  • Probabilistic database
  • Probabilistic inference
  • Rejection sampling

Fingerprint

Dive into the research topics of 'Tracing errors in probabilistic databases based on the Bayesian network'. Together they form a unique fingerprint.

Cite this