TY - GEN
T1 - A Query-Aware Enormous Database Generator For System Performance Evaluation
AU - Huang, Xuhua
AU - Hu, Zirui
AU - Weng, Siyang
AU - Zhang, Rong
AU - Yang, Chengcheng
AU - Zhou, Xuan
AU - Qian, Weining
AU - Yang, Chuanhui
AU - Xu, Quanqing
N1 - Publisher Copyright:
© 2025 ACM.
PY - 2025/6/22
Y1 - 2025/6/22
N2 - In production, simulating the real application without exposing the privacy data is essential for database benchmarking or performance debugging. A rich body of query-aware database generators (QAG) are proposed for this purpose. The complex data dependencies hidden behind queries make previous work suffer from critical deficiencies in supporting complex operators with high simulation accuracy. To fill the gap between the existing QAGs and the urgent demands, we implement a data generator Mirage with the attractive characteristics of reproducing applications based on the queries even with complex operators and having a theoretical zero error. Specifically, Mirage leverages Query Rewriting and Set Transforming Rules to decouple the implicit dependencies from queries, which greatly simplify the generation problem; it presents a uniform representation of various join types and formulates key population as a Constraint Programming (CP) problem, which can be well solved by an off-the-shelf CP Solver. In this demonstration, users can explore the core features of Mirage in generating synthetic databases, which has the widest support to operators and the best simulation fidelity compared to the related work.
AB - In production, simulating the real application without exposing the privacy data is essential for database benchmarking or performance debugging. A rich body of query-aware database generators (QAG) are proposed for this purpose. The complex data dependencies hidden behind queries make previous work suffer from critical deficiencies in supporting complex operators with high simulation accuracy. To fill the gap between the existing QAGs and the urgent demands, we implement a data generator Mirage with the attractive characteristics of reproducing applications based on the queries even with complex operators and having a theoretical zero error. Specifically, Mirage leverages Query Rewriting and Set Transforming Rules to decouple the implicit dependencies from queries, which greatly simplify the generation problem; it presents a uniform representation of various join types and formulates key population as a Constraint Programming (CP) problem, which can be well solved by an off-the-shelf CP Solver. In this demonstration, users can explore the core features of Mirage in generating synthetic databases, which has the widest support to operators and the best simulation fidelity compared to the related work.
KW - database generation
KW - performance benchmarking
UR - https://www.scopus.com/pages/publications/105010222565
U2 - 10.1145/3722212.3725076
DO - 10.1145/3722212.3725076
M3 - 会议稿件
AN - SCOPUS:105010222565
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 131
EP - 134
BT - SIGMOD-Companion 2025 - Companion of the 2025 International Conference on Management of Data
A2 - Deshpande, Amol
A2 - Aboulnaga, Ashraf
A2 - Salimi, Babak
A2 - Chandramouli, Badrish
A2 - Howe, Bill
A2 - Loo, Boon Thau
A2 - Glavic, Boris
A2 - Curino, Carlo
A2 - Zhe Wang, Daisy
A2 - Suciu, Dan
A2 - Abadi, Daniel
A2 - Srivastava, Divesh
A2 - Wu, Eugene
A2 - Nawab, Faisal
A2 - Ilyas, Ihab
A2 - Naughton, Jeffrey
A2 - Rogers, Jennie
A2 - Patel, Jignesh
A2 - Arulraj, Joy
A2 - Yang, Jun
A2 - Echihabi, Karima
A2 - Ross, Kenneth
A2 - Daudjee, Khuzaima
A2 - Lakshmanan, Laks
A2 - Garofalakis, Minos
A2 - Riedewald, Mirek
A2 - Mokbel, Mohamed
A2 - Ouzzani, Mourad
A2 - Kennedy, Oliver
A2 - Kennedy, Oliver
A2 - Papotti, Paolo
A2 - Alvaro, Peter
A2 - Bailis, Peter
A2 - Miller, Renee
A2 - Roy, Senjuti Basu
A2 - Melnik, Sergey
A2 - Idreos, Stratos
A2 - Roy, Sudeepa
A2 - Rekatsinas, Theodoros
A2 - Leis, Viktor
A2 - Zhou, Wenchao
A2 - Gatterbauer, Wolfgang
A2 - Ives, Zack
PB - Association for Computing Machinery
T2 - 2025 ACM SIGMOD/PODS International Conference on Management of Data, SIGMOD-Companion 2025
Y2 - 22 June 2025 through 27 June 2025
ER -