跳到主要导航 跳到搜索 跳到主要内容

Distributed stream join under workload variance

  • East China Normal University

科研成果: 期刊稿件文章同行评审

摘要

Flexible and self-adaptive stream join processing plays an important role in a parallel shared-nothing environments. Join-Matrix model is a high-performance model which is resilient to data skew and supports arbitrary join predicates for taking random tuple distribution as its routing policy. To maximize system throughputs and minimize network communication cost, a scalable partitioning scheme on matrix is critical. In this paper, we present a novel flexible and adaptive scheme partitioning model for stream join operator, which ensures high throughput but with economical resource usages by allocating resources on demand. Specifically, a lightweight scheme generator, which requires the sample of each stream volume and processing resource quota of each physical machine, generates a join scheme; then a migration plan generator decides how to migrate data among machines under the consideration of minimizing migration cost while ensuring correctness. We do extensive experiments on different kinds of join workloads and the evaluation shows high competence comparing with baseline systems on benchmark data and real data.

源语言英语
页(从-至)1089-1110
页数22
期刊World Wide Web
20
5
DOI
出版状态已出版 - 1 9月 2017

指纹

探究 'Distributed stream join under workload variance' 的科研主题。它们共同构成独一无二的指纹。

引用此