TY - GEN
T1 - Cost-effective stream join algorithm on cloud system
AU - Fang, Junhua
AU - Zhang, Rong
AU - Wang, Xiaotong
AU - Fu, Tom Z.J.
AU - Zhang, Zhenjie
AU - Zhou, Aoying
N1 - Publisher Copyright:
© 2016 ACM.
PY - 2016/10/24
Y1 - 2016/10/24
N2 - Matrix-based model perfectly supports distributed stream join operator, which generally applies to arbitrary join predicate and guarantees the completeness of the join results. However, high dynam-icity and uncertainty of real-world data stream call for better adap-tivity and lower operational cost, without which the stream join operator may suffer from performance drop and overpaid computation resource. Existing Join-Matrix model is unable to provide such capability, due to its fixed workload partitioning and difficulty on dynamic repartitioning. It is thus unclear how to take advantage of the load balancing benefits of Join-Matrix model while providing more flexibility to the distributed stream join computation at a lower cost. In this paper, we present a new cost-effective stream join algorithm, enhancing the adaptability of Join-Matrix model and minimizing the resource based on the varying workload. Our proposal includes a varietal matrix generation algorithm devised to build irregular matrix scheme for minimal task assignment; a lightweight migration algorithm designed to cut off unnecessary migration cost; and a load balancing framework to maximize the processing throughput. Extensive experiments are conducted to compare our proposal against state-of-the-art solutions on benchmark and real-world workloads, proving the effectiveness of our method, especially on reducing the operational cost under pay-as-you-go pricing scheme.
AB - Matrix-based model perfectly supports distributed stream join operator, which generally applies to arbitrary join predicate and guarantees the completeness of the join results. However, high dynam-icity and uncertainty of real-world data stream call for better adap-tivity and lower operational cost, without which the stream join operator may suffer from performance drop and overpaid computation resource. Existing Join-Matrix model is unable to provide such capability, due to its fixed workload partitioning and difficulty on dynamic repartitioning. It is thus unclear how to take advantage of the load balancing benefits of Join-Matrix model while providing more flexibility to the distributed stream join computation at a lower cost. In this paper, we present a new cost-effective stream join algorithm, enhancing the adaptability of Join-Matrix model and minimizing the resource based on the varying workload. Our proposal includes a varietal matrix generation algorithm devised to build irregular matrix scheme for minimal task assignment; a lightweight migration algorithm designed to cut off unnecessary migration cost; and a load balancing framework to maximize the processing throughput. Extensive experiments are conducted to compare our proposal against state-of-the-art solutions on benchmark and real-world workloads, proving the effectiveness of our method, especially on reducing the operational cost under pay-as-you-go pricing scheme.
KW - Cost effective
KW - Dstributed stream join
KW - Matrix model
KW - Theta-join
UR - https://www.scopus.com/pages/publications/84996525954
U2 - 10.1145/2983323.2983773
DO - 10.1145/2983323.2983773
M3 - 会议稿件
AN - SCOPUS:84996525954
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 1773
EP - 1782
BT - CIKM 2016 - Proceedings of the 2016 ACM Conference on Information and Knowledge Management
PB - Association for Computing Machinery
T2 - 25th ACM International Conference on Information and Knowledge Management, CIKM 2016
Y2 - 24 October 2016 through 28 October 2016
ER -