跳到主要导航 跳到搜索 跳到主要内容

Distributed Stream Rebalance for Stateful Operator under Workload Variance

  • Junhua Fang
  • , Rong Zhang*
  • , Tom Z.J. Fu
  • , Zhenjie Zhang
  • , Aoying Zhou
  • , Xiaofang Zhou
  • *此作品的通讯作者

科研成果: 期刊稿件文章同行评审

摘要

Key-based workload partitioning is now commonly used in parallel stream processing, enabling effective key-value tuple distribution over worker threads in a logical operator. While randomized hashing on the keys is capable of balancing the workload for key-based partitioning when the keys generally follow a static distribution, it is likely to generate poor balancing performance when workload variance occurs on the incoming data stream. This paper presents a new key-based workload partitioning framework, with practical algorithms to support dynamic workload assignment for stateful operators. The framework combines hash-based and explicit key-based routing strategies for workload distribution, which specifies the destination worker threads for a handful of keys and assigns the other keys with the hashing function. We formulate the rebalance operation as an optimization problem, with multiple objectives on minimizing state migration costs, controlling the size of the routing table and breaking workload imbalance among the worker threads. Despite of the NP-hardness nature behind the optimization formulation, we carefully investigate and justify the heuristics behind key (re)routing and state migration, to facilitate fast response to workload variance with ignorable cost to the normal processing in the distributed system. Empirical studies on synthetic data and real-world stream applications validate the usefulness of our proposals and prove the huge advantage of our approaches over state-of-the-art solutions in the literature.

源语言英语
文章编号8340854
页(从-至)2223-2240
页数18
期刊IEEE Transactions on Parallel and Distributed Systems
29
10
DOI
出版状态已出版 - 1 10月 2018

指纹

探究 'Distributed Stream Rebalance for Stateful Operator under Workload Variance' 的科研主题。它们共同构成独一无二的指纹。

引用此