Column-oriented query execution engine for OLAP based on triplet

Yue An Zhu, Yan Song Zhang, Xuan Zhou, Shan Wang

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

Integrating big data and traditional data warehouse (DW) techniques bring demand for real-time big data analysis. The new demand means DW can not depend too much on the optimization such as materialization and indexing which consume large space, but instead needs to enhance ability of real-time analysis to handle big data analysis which usually issues complex queries on huge data volumes. Those queries usually consist in applying group or aggregation operator on the join result between fact table and dimension table(s). The join and group operation often are the bottle-necks for performance improvement. This paper studies the OLAP performance under the new hardware platform and big data environment, and develops a new OLAP query execution engine in columnar storage, called CDDTA-MMDB (columnar direct dimensional tuple access for main memory database query execution engine). The optimized materialization makes CDDTA-MMDB reduce access to base table and intermediate data structure during join procedure. CDDTAMMDB decomposes the query into sub-queries on the fact table and dimension table respectively. If the sub-query on dimension table only serves as filter, it will produce the binary tuple <surrogate, Boolean_value≤; otherwise, it will produce the triplet in the form of <surrogate, key, value≤. Thus, by just scanning the fact table one-pass and utilizing the mapping function of foreign keys in fact table to directly access the binary tuples or triplets, the executor can accomplish the join, filter and group operations. Consideration is fully placed on the design principle for the main-memory columnar database. Experimental results show that the system is efficient and can be 2.5 times faster than MonetDB 5.5 and 5 times faster than invisible join used by C-store. Moreover, it scales linearly on multi-core processors.

Original languageEnglish
Pages (from-to)753-767
Number of pages15
JournalRuan Jian Xue Bao/Journal of Software
Volume25
Issue number4
DOIs
StatePublished - 2014
Externally publishedYes

Keywords

  • Big data analysis
  • Join algorithm
  • Main-memory columnar database
  • Materialization
  • OLAP

Fingerprint

Dive into the research topics of 'Column-oriented query execution engine for OLAP based on triplet'. Together they form a unique fingerprint.

Cite this