TY - GEN
T1 - Touchstone
T2 - 2018 USENIX Annual Technical Conference, USENIX ATC 2018
AU - Li, Yuming
AU - Zhang, Rong
AU - Yang, Xiaoyan
AU - Zhang, Zhenjie
AU - Zhou, Aoying
N1 - Publisher Copyright:
© Proceedings of the 2018 USENIX Annual Technical Conference, USENIX ATC 2018. All rights reserved.
PY - 2020
Y1 - 2020
N2 - Query-aware synthetic data generation is an essential and highly challenging task, important for database management system (DBMS) testing, database application testing and application-driven benchmarking. Prior studies on query-aware data generation suffer common problems of limited parallelization, poor scalability, and excessive memory consumption, making these systems unsatisfactory to terabyte scale data generation. In order to fill the gap between the existing data generation techniques and the emerging demands of enormous query-aware test databases, we design and implement our new data generator, called Touchstone. Touchstone adopts the random sampling algorithm instantiating the query parameters and the new data generation schema generating the test database, to achieve fully parallel data generation, linear scalability and austere memory consumption. Our experimental results show that Touchstone consistently outperforms the state-of-the-art solution on TPC-H workload by a 1000× speedup without sacrificing accuracy.
AB - Query-aware synthetic data generation is an essential and highly challenging task, important for database management system (DBMS) testing, database application testing and application-driven benchmarking. Prior studies on query-aware data generation suffer common problems of limited parallelization, poor scalability, and excessive memory consumption, making these systems unsatisfactory to terabyte scale data generation. In order to fill the gap between the existing data generation techniques and the emerging demands of enormous query-aware test databases, we design and implement our new data generator, called Touchstone. Touchstone adopts the random sampling algorithm instantiating the query parameters and the new data generation schema generating the test database, to achieve fully parallel data generation, linear scalability and austere memory consumption. Our experimental results show that Touchstone consistently outperforms the state-of-the-art solution on TPC-H workload by a 1000× speedup without sacrificing accuracy.
UR - https://www.scopus.com/pages/publications/85077435602
M3 - 会议稿件
AN - SCOPUS:85077435602
T3 - Proceedings of the 2018 USENIX Annual Technical Conference, USENIX ATC 2018
SP - 575
EP - 586
BT - Proceedings of the 2018 USENIX Annual Technical Conference, USENIX ATC 2018
PB - USENIX Association
Y2 - 11 July 2018 through 13 July 2018
ER -