Using wide table to manage web data: A survey

Bin Yang, Weining Qian*, Aoying Zhou

*Corresponding author for this work

Research output: Contribution to journalReview articlepeer-review

11 Scopus citations

Abstract

With the development of World Wide Web (www), storage and utilization of web data has become a big challenge for data management research community. Web data are essentially heterogeneous data, and may change schema frequently, traditional relational data model is inappropriate for web data management. A new data model, called Wide Table (or WT for simplicity), was introduced for this task. There are several characteristics of the WT model. First, WT is usually highly sparsely populated so that most data can be fit into a line or record. Second, queries are composed on only a small subset of the attributes. Thus, existing query processing and optimization techniques for relational database with normalized tables will not work efficiently anymore. Furthermore, WT is usually of extremely large volume. It is thought that only large-scale distributed storage can accommodate themassive data set. In this paper, requirements and challenges to web data management are discussed. Existing techniques for WT, including logical presentation, physical storage, and query processing, are introduced and analyzed in detail.

Original languageEnglish
Pages (from-to)211-223
Number of pages13
JournalFrontiers of Computer Science in China
Volume2
Issue number3
DOIs
StatePublished - Sep 2008

Keywords

  • Flexible query processing
  • Large-scale distributed storage
  • Sparse data
  • Web data management
  • Wide table

Fingerprint

Dive into the research topics of 'Using wide table to manage web data: A survey'. Together they form a unique fingerprint.

Cite this