Chronos: An elastic parallel framework for stream benchmark generation and simulation

Ling Gu, Minqi Zhou*, Zhenjie Zhang, Ming Chien Shan, Aoying Zhou, Marianne Winslett

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

17 Scopus citations

Abstract

In the coming big data era, stress test to IT systems under extreme data volume is crucial to the adoption of computing technologies in every corner of the cyber world. Appropriately generated benchmark datasets provide the possibility for administrators to evaluate the capacity of the systems when real datasets hard obtained have not extreme cases. Traditional benchmark data generators, however, mainly target at producing relation tables of arbitrary size following fixed distributions. The output of such generators are insufficient when it is used to measure the stability of the architecture with extremely dynamic and heavy workloads, caused by complicated/hiden factors in the generation mechanism of real world, e.g. dependency between stocks in the trading market and collaborative human behaviors on the social network. In this paper, we present a new framework, called Chronos, to support new demands on streaming data benchmarking, by generating and simulating realistic and fast data streams in an elastic manner. Given a small group of samples with timestamps, Chronos reproduces new data streams with similar characteristics of the samples, preserving column-wise correlations, temporal dependency and order statistics of the snapshot distributions at the same time. To achieve such realistic requirements, we propose 1) a column decomposition optimization technique to partition the original relation table into small sub-tables with minimal correlation information loss, 2) a generative and extensible model based on Latent Dirichlet Allocation to capture temporal dependency while preserving order statistics of the snapshot distribution, and 3) a new generation and assembling method to efficiently build tuples following the expected distribution on the snapshots. To fulfill the vision of elasticity, we also present a new parallel stream data generation mechanism, facilitating distributed nodes to collaboratively generate tuples with minimal synchronization overhead and excellent load balancing. Our extensive experimental studies on real world data domains confirm the efficiency and effectiveness of Chronos on stream benchmark generation and simulation.

Original languageEnglish
Title of host publication2015 IEEE 31st International Conference on Data Engineering, ICDE 2015
PublisherIEEE Computer Society
Pages101-112
Number of pages12
ISBN (Electronic)9781479979639
DOIs
StatePublished - 26 May 2015
Event2015 31st IEEE International Conference on Data Engineering, ICDE 2015 - Seoul, Korea, Republic of
Duration: 13 Apr 201517 Apr 2015

Publication series

NameProceedings - International Conference on Data Engineering
Volume2015-May
ISSN (Print)1084-4627

Conference

Conference2015 31st IEEE International Conference on Data Engineering, ICDE 2015
Country/TerritoryKorea, Republic of
CitySeoul
Period13/04/1517/04/15

Fingerprint

Dive into the research topics of 'Chronos: An elastic parallel framework for stream benchmark generation and simulation'. Together they form a unique fingerprint.

Cite this