Mirage: Generating Enormous Databases for Complex Workloads

Qingshuai Wang, Hao Li, Zirui Hu, Rong Zhang*, Chengcheng Yang*, Peng Cai, Xuan Zhou, Aoying Zhou

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

To optimize query parallelism techniques, substantial workloads are required with specific query plans and customized output size for each operator (denoted as cardinality constraint). To this end, a rich body of query-aware database generators (QAG) are proposed. However, the complex data dependencies hidden behind queries make previous QAGs suffer from deficiencies in supporting complex operators and controlling the generation errors. In this paper, we design a new generator Mirage supporting well for complex operators with low error bounds for cardinality constraints. First, Mirage leverages Query Rewriting and Set Transforming Rules to decouple dependencies between key and non-key columns, which could help generate each of them individually. Then, for the non-key columns, Mirage abstracts cardinality constraints of operators as placement requirements within each column's domain, and further models the generation problem as a classic bin packing problem. Finally, for the key columns, Mirage proposes a uniform representation of join cardinality constraints for all types of PK-FK joins and partitions the data according to the matching status between PK and F K columns. Then, it formulates the key population as a Constraint Programming problem, which can be solved by an existing CP Solver. The experiments show that Mirage conquers all previous work in either operator support or generation error.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE 40th International Conference on Data Engineering, ICDE 2024
PublisherIEEE Computer Society
Pages3989-4001
Number of pages13
ISBN (Electronic)9798350317152
DOIs
StatePublished - 2024
Event40th IEEE International Conference on Data Engineering, ICDE 2024 - Utrecht, Netherlands
Duration: 13 May 202417 May 2024

Publication series

NameProceedings - International Conference on Data Engineering
ISSN (Print)1084-4627
ISSN (Electronic)2375-0286

Conference

Conference40th IEEE International Conference on Data Engineering, ICDE 2024
Country/TerritoryNetherlands
CityUtrecht
Period13/05/2417/05/24

Keywords

  • benchmarking
  • performance evaluation
  • query optimization
  • query-aware database generator

Fingerprint

Dive into the research topics of 'Mirage: Generating Enormous Databases for Complex Workloads'. Together they form a unique fingerprint.

Cite this