TY - GEN
T1 - Multi-agent Independent PPO-based Automatic ECN Tuning for High-Speed Data Center Networks
AU - Wang, Ting
AU - Cheng, Kai
AU - Du, Xiao
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Explicit Congestion Notification (ECN)-based congestion control schemes have been widely adopted in high-speed data center networks (DCNs), where the ECN marking threshold plays a determinant role in guaranteeing a packet lossless DCN. However, existing approaches either employ static settings with immutable thresholds that cannot be dynamically self-adjusted to adapt to network dynamics, or fail to take into account many-to-one traffic patterns and different requirements of different types of traffic, resulting in relatively poor performance. To address these problems, this paper proposes a novel learningbased automatic ECN tuning scheme, named PET, based on the multi-agent Independent Proximal Policy Optimization (IPPO) algorithm. PET dynamically adjusts ECN thresholds by fully considering pivotal congestion-contributing factors, including queue length, output data rate, output rate of ECN-marked packets, current ECN threshold, the extent of incast, and the ratio of mice and elephant flows. PET adopts the Decentralized Training and Decentralized Execution (DTDE) paradigm and combines offline and online training to accommodate network dynamics. PET is also fair and readily deployable with commodity hardware. Comprehensive experimental results demonstrate that, compared with state-of-the-art static schemes and the learningbased automatic scheme, our PET achieves better performance in terms of flow completion time, convergence rate, queue length variance, and system robustness.
AB - Explicit Congestion Notification (ECN)-based congestion control schemes have been widely adopted in high-speed data center networks (DCNs), where the ECN marking threshold plays a determinant role in guaranteeing a packet lossless DCN. However, existing approaches either employ static settings with immutable thresholds that cannot be dynamically self-adjusted to adapt to network dynamics, or fail to take into account many-to-one traffic patterns and different requirements of different types of traffic, resulting in relatively poor performance. To address these problems, this paper proposes a novel learningbased automatic ECN tuning scheme, named PET, based on the multi-agent Independent Proximal Policy Optimization (IPPO) algorithm. PET dynamically adjusts ECN thresholds by fully considering pivotal congestion-contributing factors, including queue length, output data rate, output rate of ECN-marked packets, current ECN threshold, the extent of incast, and the ratio of mice and elephant flows. PET adopts the Decentralized Training and Decentralized Execution (DTDE) paradigm and combines offline and online training to accommodate network dynamics. PET is also fair and readily deployable with commodity hardware. Comprehensive experimental results demonstrate that, compared with state-of-the-art static schemes and the learningbased automatic scheme, our PET achieves better performance in terms of flow completion time, convergence rate, queue length variance, and system robustness.
UR - https://www.scopus.com/pages/publications/105019740951
U2 - 10.1109/CLUSTER59342.2025.11186496
DO - 10.1109/CLUSTER59342.2025.11186496
M3 - 会议稿件
AN - SCOPUS:105019740951
T3 - Proceedings - IEEE International Conference on Cluster Computing, ICCC
BT - Proceedings of the 2025 IEEE International Conference on Cluster Computing, CLUSTER 2025
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2025 IEEE International Conference on Cluster Computing, CLUSTER 2025
Y2 - 3 September 2025 through 5 September 2025
ER -