Touchstone: Generating enormous query-aware test databases

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

13 Scopus citations

Abstract

Query-aware synthetic data generation is an essential and highly challenging task, important for database management system (DBMS) testing, database application testing and application-driven benchmarking. Prior studies on query-aware data generation suffer common problems of limited parallelization, poor scalability, and excessive memory consumption, making these systems unsatisfactory to terabyte scale data generation. In order to fill the gap between the existing data generation techniques and the emerging demands of enormous query-aware test databases, we design and implement our new data generator, called Touchstone. Touchstone adopts the random sampling algorithm instantiating the query parameters and the new data generation schema generating the test database, to achieve fully parallel data generation, linear scalability and austere memory consumption. Our experimental results show that Touchstone consistently outperforms the state-of-the-art solution on TPC-H workload by a 1000× speedup without sacrificing accuracy.

Original languageEnglish
Title of host publicationProceedings of the 2018 USENIX Annual Technical Conference, USENIX ATC 2018
PublisherUSENIX Association
Pages575-586
Number of pages12
ISBN (Electronic)9781939133021
StatePublished - 2020
Event2018 USENIX Annual Technical Conference, USENIX ATC 2018 - Boston, United States
Duration: 11 Jul 201813 Jul 2018

Publication series

NameProceedings of the 2018 USENIX Annual Technical Conference, USENIX ATC 2018

Conference

Conference2018 USENIX Annual Technical Conference, USENIX ATC 2018
Country/TerritoryUnited States
CityBoston
Period11/07/1813/07/18

Fingerprint

Dive into the research topics of 'Touchstone: Generating enormous query-aware test databases'. Together they form a unique fingerprint.

Cite this