A Query-Aware Enormous Database Generator For System Performance Evaluation

Xuhua Huang, Zirui Hu, Siyang Weng, Rong Zhang, Chengcheng Yang, Xuan Zhou, Weining Qian, Chuanhui Yang, Quanqing Xu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In production, simulating the real application without exposing the privacy data is essential for database benchmarking or performance debugging. A rich body of query-aware database generators (QAG) are proposed for this purpose. The complex data dependencies hidden behind queries make previous work suffer from critical deficiencies in supporting complex operators with high simulation accuracy. To fill the gap between the existing QAGs and the urgent demands, we implement a data generator Mirage with the attractive characteristics of reproducing applications based on the queries even with complex operators and having a theoretical zero error. Specifically, Mirage leverages Query Rewriting and Set Transforming Rules to decouple the implicit dependencies from queries, which greatly simplify the generation problem; it presents a uniform representation of various join types and formulates key population as a Constraint Programming (CP) problem, which can be well solved by an off-the-shelf CP Solver. In this demonstration, users can explore the core features of Mirage in generating synthetic databases, which has the widest support to operators and the best simulation fidelity compared to the related work.

Original languageEnglish
Title of host publicationSIGMOD-Companion 2025 - Companion of the 2025 International Conference on Management of Data
EditorsAmol Deshpande, Ashraf Aboulnaga, Babak Salimi, Badrish Chandramouli, Bill Howe, Boon Thau Loo, Boris Glavic, Carlo Curino, Daisy Zhe Wang, Dan Suciu, Daniel Abadi, Divesh Srivastava, Eugene Wu, Faisal Nawab, Ihab Ilyas, Jeffrey Naughton, Jennie Rogers, Jignesh Patel, Joy Arulraj, Jun Yang, Karima Echihabi, Kenneth Ross, Khuzaima Daudjee, Laks Lakshmanan, Minos Garofalakis, Mirek Riedewald, Mohamed Mokbel, Mourad Ouzzani, Oliver Kennedy, Oliver Kennedy, Paolo Papotti, Peter Alvaro, Peter Bailis, Renee Miller, Senjuti Basu Roy, Sergey Melnik, Stratos Idreos, Sudeepa Roy, Theodoros Rekatsinas, Viktor Leis, Wenchao Zhou, Wolfgang Gatterbauer, Zack Ives
PublisherAssociation for Computing Machinery
Pages131-134
Number of pages4
ISBN (Electronic)9798400715648
DOIs
StatePublished - 22 Jun 2025
Event2025 ACM SIGMOD/PODS International Conference on Management of Data, SIGMOD-Companion 2025 - Berlin, Germany
Duration: 22 Jun 202527 Jun 2025

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Conference

Conference2025 ACM SIGMOD/PODS International Conference on Management of Data, SIGMOD-Companion 2025
Country/TerritoryGermany
CityBerlin
Period22/06/2527/06/25

Keywords

  • database generation
  • performance benchmarking

Fingerprint

Dive into the research topics of 'A Query-Aware Enormous Database Generator For System Performance Evaluation'. Together they form a unique fingerprint.

Cite this