TY - GEN
T1 - Learning from the Past
T2 - 41st IEEE International Conference on Data Engineering, ICDE 2025
AU - Han, Yuxing
AU - Chen, Lixiang
AU - Wang, Haoyu
AU - Chen, Zhanghao
AU - Zhang, Yifan
AU - Yang, Chengcheng
AU - Hao, Kongzhang
AU - Yang, Zhengyi
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Distributed stream processing systems rely on the dataflow model to define and execute streaming jobs, organizing computations as Directed Acyclic Graphs (DAGs) of operators. Adjusting the parallelism of these operators is crucial to handling fluctuating workloads efficiently while balancing resource usage and processing performance. However, existing methods often fail to effectively utilize execution histories or fully exploit DAG structures, limiting their ability to identify bottlenecks and determine the optimal parallelism. In this paper, we propose StreamTune, a novel approach for adaptive parallelism tuning in stream processing systems. StreamTune incorporates a pre-training and fine-tuning framework that leverages global knowledge from historical execution data for job-specific parallelism tuning. In the pre-training phase, StreamTune clusters the historical data with Graph Edit Distance and pre-trains a Graph Neural Network-based encoder per cluster to capture the correlation between the operator parallelism, DAG structures, and the identified operator-level bottlenecks. In the online tuning phase, Stream-Tu ne iteratively refines operator parallelism recommendations using an operator-level bottleneck prediction model enforced with a monotonic constraint, which aligns with the observed system performance behavior. Evaluation results demonstrate that StreamTune reduces reconfigurations by up to 29.6% and parallelism degrees by up to 30.8% in Apache Flink under a synthetic workload. In Timely Dataflow, StreamTune achieves up to an 83.3% reduction in parallelism degrees while maintaining comparable processing performance under the Nexmark benchmark, when compared to the state-of-the-art methods.
AB - Distributed stream processing systems rely on the dataflow model to define and execute streaming jobs, organizing computations as Directed Acyclic Graphs (DAGs) of operators. Adjusting the parallelism of these operators is crucial to handling fluctuating workloads efficiently while balancing resource usage and processing performance. However, existing methods often fail to effectively utilize execution histories or fully exploit DAG structures, limiting their ability to identify bottlenecks and determine the optimal parallelism. In this paper, we propose StreamTune, a novel approach for adaptive parallelism tuning in stream processing systems. StreamTune incorporates a pre-training and fine-tuning framework that leverages global knowledge from historical execution data for job-specific parallelism tuning. In the pre-training phase, StreamTune clusters the historical data with Graph Edit Distance and pre-trains a Graph Neural Network-based encoder per cluster to capture the correlation between the operator parallelism, DAG structures, and the identified operator-level bottlenecks. In the online tuning phase, Stream-Tu ne iteratively refines operator parallelism recommendations using an operator-level bottleneck prediction model enforced with a monotonic constraint, which aligns with the observed system performance behavior. Evaluation results demonstrate that StreamTune reduces reconfigurations by up to 29.6% and parallelism degrees by up to 30.8% in Apache Flink under a synthetic workload. In Timely Dataflow, StreamTune achieves up to an 83.3% reduction in parallelism degrees while maintaining comparable processing performance under the Nexmark benchmark, when compared to the state-of-the-art methods.
UR - https://www.scopus.com/pages/publications/105015418115
U2 - 10.1109/ICDE65448.2025.00264
DO - 10.1109/ICDE65448.2025.00264
M3 - 会议稿件
AN - SCOPUS:105015418115
T3 - Proceedings - International Conference on Data Engineering
SP - 3535
EP - 3548
BT - Proceedings - 2025 IEEE 41st International Conference on Data Engineering, ICDE 2025
PB - IEEE Computer Society
Y2 - 19 May 2025 through 23 May 2025
ER -