Flexible and adaptive stream join algorithm

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

Flexibility and self-adaptivity are important to real-time join processing in a parallel shared-nothing environment. Join-Matrix is a high-performance model on distributed stream joins and supports arbitrary join predicates. It can handle data skew perfectly since it randomly routes tuples to cells with each steam corresponding to one side of the matrix. Designing of the partitioning scheme of the matrix is a determining factor to maximize system throughputs under the premise of economizing computing resources. In this paper, we propose a novel flexible and adaptive scheme partitioning algorithm for stream join operator, which ensures high throughput but with economical resource usages by allocating resources on demand. Specifically, a lightweight scheme generator, which requires the sample of each stream volume and processing resource quota of each physical machine, generates a join scheme; then a migration plan generator decides how to migrate data among machines under the consideration of minimizing migration cost while ensuring correctness. Extensive experiments are done on different kind of join workloads and show high competence comparing with baseline systems on benchmark.

Original languageEnglish
Title of host publicationWeb Technologies and Applications - 18th Asia-Pacific Web Conference, APWeb 2016, Proceedings
EditorsKyuseok Shim, Kai Zheng, Guanfeng Liu, Feifei Li
PublisherSpringer Verlag
Pages3-16
Number of pages14
ISBN (Print)9783319458168
DOIs
StatePublished - 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9932 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Fingerprint

Dive into the research topics of 'Flexible and adaptive stream join algorithm'. Together they form a unique fingerprint.

Cite this