TY - JOUR
T1 - A Scalable Query-Aware Enormous Database Generator for Database Evaluation
AU - Wang, Qingshuai
AU - Li, Yuming
AU - Zhang, Rong
AU - Shu, Ke
AU - Zhang, Zhenjie
AU - Zhou, Aoying
N1 - Publisher Copyright:
© 1989-2012 IEEE.
PY - 2023/5/1
Y1 - 2023/5/1
N2 - Query-aware synthetic data generation is an essential and highly challenging task, important for database management system (DBMS) testing, database application testing and application-driven benchmarking. Prior studies on query-aware data generation suffer common problems of limited parallelization, poor scalability, and excessive memory consumption, making these systems unsatisfactory to terabyte scale data generation. In order to fill the gap between the existing data generation techniques and the emerging demands of enormous query-aware test databases, we design and implement a new data generator, called Touchstone. Touchstone adopts the random sampling algorithm instantiating query parameters and the new data generation schema generating the test database, to achieve fully parallel data generation, linear scalability and austere memory consumption. It has full support of outer joins as well as non-equi-joins for application-oriented data generation. Our experimental results show that Touchstone consistently outperforms the state-of-the-art solution on TPC-H workload by a 1000× speedup without sacrificing simulation fidelity.
AB - Query-aware synthetic data generation is an essential and highly challenging task, important for database management system (DBMS) testing, database application testing and application-driven benchmarking. Prior studies on query-aware data generation suffer common problems of limited parallelization, poor scalability, and excessive memory consumption, making these systems unsatisfactory to terabyte scale data generation. In order to fill the gap between the existing data generation techniques and the emerging demands of enormous query-aware test databases, we design and implement a new data generator, called Touchstone. Touchstone adopts the random sampling algorithm instantiating query parameters and the new data generation schema generating the test database, to achieve fully parallel data generation, linear scalability and austere memory consumption. It has full support of outer joins as well as non-equi-joins for application-oriented data generation. Our experimental results show that Touchstone consistently outperforms the state-of-the-art solution on TPC-H workload by a 1000× speedup without sacrificing simulation fidelity.
KW - OLAP database testing
KW - Query-aware data generator
KW - query generator
UR - https://www.scopus.com/pages/publications/85125317106
U2 - 10.1109/TKDE.2022.3153651
DO - 10.1109/TKDE.2022.3153651
M3 - 文章
AN - SCOPUS:85125317106
SN - 1041-4347
VL - 35
SP - 4395
EP - 4410
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 5
ER -