High-Performance Data Distribution Algorithm on Distributed Stream Systems

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

Along with the popularization of big data applications, scalable and efficient stream join processing plays a more important role in online real-time analysis. The distributed parallel processing framework provides an effective solution which facilitates processing of massive data stream with low latency. For Key-based calculations, data skewness and inherent features of stream data, such as real-time, dynamics and unpredictability on data volume, lead to load imbalance to distributed processing systems. Such phenomenon can produce poor performance and waste hardware resources. There have been two solutions to load imbalance: 1) Key-based migration scheme that keeps balance among parallel processing nodes; 2) tuple-based partitioning scheme that distributes data randomly to achieve load balance. The former scheme adjusts system to the defined equilibrium range, which resembles the one-dimensional packing problem. And the latter maintains the accuracy of Key-based operations, which certainly incurs additional memory cost and network communication cost. This paper presents a novel parallel processing scheme that combines both Key-based and tuple-based schemes to partition keys on demand. The proposed scheme adopts a lightweight load balance algorithm and a partitioning scheme which retains the characteristics of Key-based operations, thus realizing the load balance of tuple-base strategy while reducing the additional cost of fine-grained balance.

Original languageEnglish
Pages (from-to)563-578
Number of pages16
JournalRuan Jian Xue Bao/Journal of Software
Volume28
Issue number3
DOIs
StatePublished - 1 Mar 2017

Keywords

  • Distributed data stream
  • Key-based operation
  • Workload balance
  • Workload migration
  • Workload skew

Fingerprint

Dive into the research topics of 'High-Performance Data Distribution Algorithm on Distributed Stream Systems'. Together they form a unique fingerprint.

Cite this