A survey on management of data provenance

Research output: Contribution to journalArticlepeer-review

24 Scopus citations

Abstract

The data provenance describes about how data is generated and evolves with time going on, which has many applications, including evaluation of data quality, audit trail, replication recipes, data citation, etc. Generally, the data provenance could be recorded among multiple sources, or just within a single data source. In other words, the derivation history of data could take place either in schema level, or in instance level. This paper surveys the researches about presentation and query of data provenance both in schema level and instance level. For the schema level, the focus is on query rewriting and schema mappings, and for the instance level, the focus includes relational data provenance, XML data provenance, streaming data provenance. Moreover, the research efforts of uncertain data provenance to track the derivation of data and uncertainty are also summarized. Finally, this paper lists applications of the data provenance, discusses the main challenges, and points out some research issues in future.

Original languageEnglish
Pages (from-to)373-389
Number of pages17
JournalJisuanji Xuebao/Chinese Journal of Computers
Volume33
Issue number3
DOIs
StatePublished - Mar 2010

Keywords

  • Data integration
  • Data provenance
  • Data space
  • Provenance semiring
  • Uncertain data

Fingerprint

Dive into the research topics of 'A survey on management of data provenance'. Together they form a unique fingerprint.

Cite this