A-DSP: An adaptive join algorithm for dynamic data stream on cloud system

Junhua Fang, Rong Zhang, Yan Zhao, Kai Zheng, Xiaofang Zhou, Aoying Zhou

Research output: Contribution to journalArticlepeer-review

8 Scopus citations

Abstract

The join operations, including both equi and non-equi joins, are essential to the complex data analytics in the big data era. However, they are not inherently supported by existing DSPEs (Distributed Stream Processing Engines). The state-of-the-art join solutions on DSPEs rely on either complicated routing strategies or resource-inefficient processing structures, which are susceptible to dynamic workload, especially when the DSPEs face various join predicate operations and skewed data distribution. In this paper, we propose a new cost-effective stream join framework, named A-DSP (Adaptive Dimensional Space Processing), which enhances the adaptability of real-time join model and minimizes the resource used over the dynamic workloads. Our proposal includes: 1) a join model generation algorithm devised to adaptively switch between different join schemes so as to minimize the number of processing task required; 2) a load-balancing mechanism which maximizes the processing throughput; and 3) a lightweight algorithm designed for cutting down unnecessary migration cost. Extensive experiments are conducted to compare our proposal against state-of-the-art solutions on both benchmark and real-world workloads. The experimental results verify the effectiveness of our method, especially on reducing the operational cost under pay-as-you-go pricing scheme.

Original languageEnglish
Article number08868214
Pages (from-to)1861-1876
Number of pages16
JournalIEEE Transactions on Knowledge and Data Engineering
Volume33
Issue number5
DOIs
StatePublished - 1 May 2021

Keywords

  • Cost effective
  • Distributed stream join
  • Theta-join

Fingerprint

Dive into the research topics of 'A-DSP: An adaptive join algorithm for dynamic data stream on cloud system'. Together they form a unique fingerprint.

Cite this