TY - JOUR
T1 - Cost-effective data partition for distributed stream processing system
AU - Wang, Xiaotong
AU - Fang, Junhua
AU - Li, Yuming
AU - Zhang, Rong
AU - Zhou, Aoying
N1 - Publisher Copyright:
© Springer International Publishing AG 2017.
PY - 2017
Y1 - 2017
N2 - Data skew and dynamics greatly affect throughput of stream processing system. It requires to design a high-efficient partition method to evenly distribute workload in a distributed and parallel. Previous research mainly focuses on load balancing adjustment based on key-asgranularity or tuple-as-granularity, both of which have their own limitations such as clumsy balance activities or expensive network cost. In this paper, we present a comprehensive cost model for partitioning method, which makes a synthesis estimation of memory, CPU and network resource utilization. Based on cost model, we propose a novel load balancing adjustment algorithm, which adopts the idea of “Split keys on demand and Merge keys as far as possible”, and is adaptive to different skewed workload. Our evaluation demonstrates that our method outperforms the state-of-the-art partitioning schemes while maintaining high throughput and resource utilization.
AB - Data skew and dynamics greatly affect throughput of stream processing system. It requires to design a high-efficient partition method to evenly distribute workload in a distributed and parallel. Previous research mainly focuses on load balancing adjustment based on key-asgranularity or tuple-as-granularity, both of which have their own limitations such as clumsy balance activities or expensive network cost. In this paper, we present a comprehensive cost model for partitioning method, which makes a synthesis estimation of memory, CPU and network resource utilization. Based on cost model, we propose a novel load balancing adjustment algorithm, which adopts the idea of “Split keys on demand and Merge keys as far as possible”, and is adaptive to different skewed workload. Our evaluation demonstrates that our method outperforms the state-of-the-art partitioning schemes while maintaining high throughput and resource utilization.
UR - https://www.scopus.com/pages/publications/85032264886
U2 - 10.1007/978-3-319-55699-4_39
DO - 10.1007/978-3-319-55699-4_39
M3 - 会议文章
AN - SCOPUS:85032264886
SN - 0302-9743
VL - 10178 LNCS
SP - 623
EP - 635
JO - Lecture Notes in Computer Science
JF - Lecture Notes in Computer Science
T2 - 22nd International Conference on Database Systems for Advanced Applications, DASFAA 2017
Y2 - 27 March 2017 through 30 March 2017
ER -