跳到主要导航 跳到搜索 跳到主要内容

High-Performance Data Distribution Algorithm on Distributed Stream Systems

  • East China Normal University

科研成果: 期刊稿件文章同行评审

摘要

Along with the popularization of big data applications, scalable and efficient stream join processing plays a more important role in online real-time analysis. The distributed parallel processing framework provides an effective solution which facilitates processing of massive data stream with low latency. For Key-based calculations, data skewness and inherent features of stream data, such as real-time, dynamics and unpredictability on data volume, lead to load imbalance to distributed processing systems. Such phenomenon can produce poor performance and waste hardware resources. There have been two solutions to load imbalance: 1) Key-based migration scheme that keeps balance among parallel processing nodes; 2) tuple-based partitioning scheme that distributes data randomly to achieve load balance. The former scheme adjusts system to the defined equilibrium range, which resembles the one-dimensional packing problem. And the latter maintains the accuracy of Key-based operations, which certainly incurs additional memory cost and network communication cost. This paper presents a novel parallel processing scheme that combines both Key-based and tuple-based schemes to partition keys on demand. The proposed scheme adopts a lightweight load balance algorithm and a partitioning scheme which retains the characteristics of Key-based operations, thus realizing the load balance of tuple-base strategy while reducing the additional cost of fine-grained balance.

源语言英语
页(从-至)563-578
页数16
期刊Ruan Jian Xue Bao/Journal of Software
28
3
DOI
出版状态已出版 - 1 3月 2017

指纹

探究 'High-Performance Data Distribution Algorithm on Distributed Stream Systems' 的科研主题。它们共同构成独一无二的指纹。

引用此