TY - JOUR
T1 - Using wide table to manage web data
T2 - A survey
AU - Yang, Bin
AU - Qian, Weining
AU - Zhou, Aoying
PY - 2008/9
Y1 - 2008/9
N2 - With the development of World Wide Web (www), storage and utilization of web data has become a big challenge for data management research community. Web data are essentially heterogeneous data, and may change schema frequently, traditional relational data model is inappropriate for web data management. A new data model, called Wide Table (or WT for simplicity), was introduced for this task. There are several characteristics of the WT model. First, WT is usually highly sparsely populated so that most data can be fit into a line or record. Second, queries are composed on only a small subset of the attributes. Thus, existing query processing and optimization techniques for relational database with normalized tables will not work efficiently anymore. Furthermore, WT is usually of extremely large volume. It is thought that only large-scale distributed storage can accommodate themassive data set. In this paper, requirements and challenges to web data management are discussed. Existing techniques for WT, including logical presentation, physical storage, and query processing, are introduced and analyzed in detail.
AB - With the development of World Wide Web (www), storage and utilization of web data has become a big challenge for data management research community. Web data are essentially heterogeneous data, and may change schema frequently, traditional relational data model is inappropriate for web data management. A new data model, called Wide Table (or WT for simplicity), was introduced for this task. There are several characteristics of the WT model. First, WT is usually highly sparsely populated so that most data can be fit into a line or record. Second, queries are composed on only a small subset of the attributes. Thus, existing query processing and optimization techniques for relational database with normalized tables will not work efficiently anymore. Furthermore, WT is usually of extremely large volume. It is thought that only large-scale distributed storage can accommodate themassive data set. In this paper, requirements and challenges to web data management are discussed. Existing techniques for WT, including logical presentation, physical storage, and query processing, are introduced and analyzed in detail.
KW - Flexible query processing
KW - Large-scale distributed storage
KW - Sparse data
KW - Web data management
KW - Wide table
UR - https://www.scopus.com/pages/publications/49549106707
U2 - 10.1007/s11704-008-0050-7
DO - 10.1007/s11704-008-0050-7
M3 - 文献综述
AN - SCOPUS:49549106707
SN - 1673-7350
VL - 2
SP - 211
EP - 223
JO - Frontiers of Computer Science in China
JF - Frontiers of Computer Science in China
IS - 3
ER -