TY - GEN
T1 - A framework for OLAP in column-store database
T2 - 15th Asia-Pacific Web Conference on Web Technologies and Applications, APWeb 2013
AU - Zhu, Yuean
AU - Zhang, Yansong
AU - Zhou, Xuan
AU - Wang, Shan
PY - 2013
Y1 - 2013
N2 - In data warehouse modeled with the star schema, data are usually retrieved by performing a join operation between the fact table and dimension table(s) followed by a selection and project operation, while join operator is the most expensive operator in RDBMS. In column-store database, there are two ways to do join. The first way is early materialization join (EM join); the other way is late materialization join (LM join). In EM join, the columns involved in the query are glued together firstly, then the glued rows are sent to join operator. Whereas, in LM join, only the attributes participated in the join operator are accessed. The problem that access to inner table is out-of-order can't be ignored for LM join. Otherwise, the naïve LM join is usually slower than EM join [9]. Since the late materialization is good for memory bandwidth and CPU efficiency, the LM join attracts more attention in academic research community. The state-of-art LM joins in column-store such as radix-cluster hash join [8] in MonetDB, invisible join [10] in C-Store all try to avoid accessing table randomly. In this paper, we devised a framework for OLAP called CDDTA-MMDB where a new join algorithm called CDDTA-LWMJoin (we contract it to LWMJoin in the following) is introduced. The LWMJoin is on the basis of our prior work: CDDTA-Join [7]. We equip the CDDTA-Join with light-weight materialization (LWM) which is designed to cut down the memory access and reduce production of intermediate data structure. Experiments show that CDDTA-MMDB is efficient and can be 2x faster than MonetDB and 4x faster than invisible join in the context of data warehouse modeled with star schema.
AB - In data warehouse modeled with the star schema, data are usually retrieved by performing a join operation between the fact table and dimension table(s) followed by a selection and project operation, while join operator is the most expensive operator in RDBMS. In column-store database, there are two ways to do join. The first way is early materialization join (EM join); the other way is late materialization join (LM join). In EM join, the columns involved in the query are glued together firstly, then the glued rows are sent to join operator. Whereas, in LM join, only the attributes participated in the join operator are accessed. The problem that access to inner table is out-of-order can't be ignored for LM join. Otherwise, the naïve LM join is usually slower than EM join [9]. Since the late materialization is good for memory bandwidth and CPU efficiency, the LM join attracts more attention in academic research community. The state-of-art LM joins in column-store such as radix-cluster hash join [8] in MonetDB, invisible join [10] in C-Store all try to avoid accessing table randomly. In this paper, we devised a framework for OLAP called CDDTA-MMDB where a new join algorithm called CDDTA-LWMJoin (we contract it to LWMJoin in the following) is introduced. The LWMJoin is on the basis of our prior work: CDDTA-Join [7]. We equip the CDDTA-Join with light-weight materialization (LWM) which is designed to cut down the memory access and reduce production of intermediate data structure. Experiments show that CDDTA-MMDB is efficient and can be 2x faster than MonetDB and 4x faster than invisible join in the context of data warehouse modeled with star schema.
KW - OLAP
KW - in-memory column-store database
KW - join
KW - materialization
UR - https://www.scopus.com/pages/publications/84875850598
U2 - 10.1007/978-3-642-37401-2_63
DO - 10.1007/978-3-642-37401-2_63
M3 - 会议稿件
AN - SCOPUS:84875850598
SN - 9783642374005
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 646
EP - 653
BT - Web Technologies and Applications - 15th Asia-Pacific Web Conference, APWeb 2013, Proceedings
Y2 - 4 April 2013 through 6 April 2013
ER -