TY - JOUR
T1 - A-DSP
T2 - An adaptive join algorithm for dynamic data stream on cloud system
AU - Fang, Junhua
AU - Zhang, Rong
AU - Zhao, Yan
AU - Zheng, Kai
AU - Zhou, Xiaofang
AU - Zhou, Aoying
N1 - Publisher Copyright:
© 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
PY - 2021/5/1
Y1 - 2021/5/1
N2 - The join operations, including both equi and non-equi joins, are essential to the complex data analytics in the big data era. However, they are not inherently supported by existing DSPEs (Distributed Stream Processing Engines). The state-of-the-art join solutions on DSPEs rely on either complicated routing strategies or resource-inefficient processing structures, which are susceptible to dynamic workload, especially when the DSPEs face various join predicate operations and skewed data distribution. In this paper, we propose a new cost-effective stream join framework, named A-DSP (Adaptive Dimensional Space Processing), which enhances the adaptability of real-time join model and minimizes the resource used over the dynamic workloads. Our proposal includes: 1) a join model generation algorithm devised to adaptively switch between different join schemes so as to minimize the number of processing task required; 2) a load-balancing mechanism which maximizes the processing throughput; and 3) a lightweight algorithm designed for cutting down unnecessary migration cost. Extensive experiments are conducted to compare our proposal against state-of-the-art solutions on both benchmark and real-world workloads. The experimental results verify the effectiveness of our method, especially on reducing the operational cost under pay-as-you-go pricing scheme.
AB - The join operations, including both equi and non-equi joins, are essential to the complex data analytics in the big data era. However, they are not inherently supported by existing DSPEs (Distributed Stream Processing Engines). The state-of-the-art join solutions on DSPEs rely on either complicated routing strategies or resource-inefficient processing structures, which are susceptible to dynamic workload, especially when the DSPEs face various join predicate operations and skewed data distribution. In this paper, we propose a new cost-effective stream join framework, named A-DSP (Adaptive Dimensional Space Processing), which enhances the adaptability of real-time join model and minimizes the resource used over the dynamic workloads. Our proposal includes: 1) a join model generation algorithm devised to adaptively switch between different join schemes so as to minimize the number of processing task required; 2) a load-balancing mechanism which maximizes the processing throughput; and 3) a lightweight algorithm designed for cutting down unnecessary migration cost. Extensive experiments are conducted to compare our proposal against state-of-the-art solutions on both benchmark and real-world workloads. The experimental results verify the effectiveness of our method, especially on reducing the operational cost under pay-as-you-go pricing scheme.
KW - Cost effective
KW - Distributed stream join
KW - Theta-join
UR - https://www.scopus.com/pages/publications/85103966801
U2 - 10.1109/TKDE.2019.2947055
DO - 10.1109/TKDE.2019.2947055
M3 - 文章
AN - SCOPUS:85103966801
SN - 1041-4347
VL - 33
SP - 1861
EP - 1876
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 5
M1 - 08868214
ER -