TY - GEN
T1 - Multi-Task Reinforcement Learning for Collaborative Network Optimization in Data Centers
AU - Wang, Ting
AU - Cheng, Kai
AU - Du, Xiao
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - As data center networks increasingly grow in complexity and scale, efficiently managing traffic scheduling and congestion control becomes crucial for optimizing network performance. Traditional single-task optimization strategies often fall short, failing to adequately address the interplay between different tasks and resulting in suboptimal performance with inefficiencies and robustness issues. To tackle these challenges, this paper proposes a novel Multi-Task Reinforcement Learning (MTRL)-based collaborative Network Optimization scheme, termed MTRLNO, which establishes a structured framework with central and edge systems (i.e., hosts and switches). The SDN-enabled central system incorporates an MTRL agent that simultaneously optimizes traffic scheduling and congestion control tasks, leveraging global network state information to formulate instructive optimization policies for edge systems. Switches implement decentralized multi-agent RL agents to facilitate automatic ECN tuning for congestion control, with the ability to handle incast issues. Hosts feature an MTRL-guided Multiple Level Feedback Queue (MLFQ) demotion threshold adjustment scheme for adaptive traffic scheduling. We further develop a Prioritized Experience Replay-based Soft Actor-Critic (PERSAC) algorithm to enhance learning efficiency and a customized multi-task learning algorithm via improved parameter-sharing to effectively adapt across multiple tasks. Experimental results demonstrate that MTRLNO significantly outperforms state-of-the-art approaches in terms of FCT, latency, and robustness across diverse network conditions.
AB - As data center networks increasingly grow in complexity and scale, efficiently managing traffic scheduling and congestion control becomes crucial for optimizing network performance. Traditional single-task optimization strategies often fall short, failing to adequately address the interplay between different tasks and resulting in suboptimal performance with inefficiencies and robustness issues. To tackle these challenges, this paper proposes a novel Multi-Task Reinforcement Learning (MTRL)-based collaborative Network Optimization scheme, termed MTRLNO, which establishes a structured framework with central and edge systems (i.e., hosts and switches). The SDN-enabled central system incorporates an MTRL agent that simultaneously optimizes traffic scheduling and congestion control tasks, leveraging global network state information to formulate instructive optimization policies for edge systems. Switches implement decentralized multi-agent RL agents to facilitate automatic ECN tuning for congestion control, with the ability to handle incast issues. Hosts feature an MTRL-guided Multiple Level Feedback Queue (MLFQ) demotion threshold adjustment scheme for adaptive traffic scheduling. We further develop a Prioritized Experience Replay-based Soft Actor-Critic (PERSAC) algorithm to enhance learning efficiency and a customized multi-task learning algorithm via improved parameter-sharing to effectively adapt across multiple tasks. Experimental results demonstrate that MTRLNO significantly outperforms state-of-the-art approaches in terms of FCT, latency, and robustness across diverse network conditions.
UR - https://www.scopus.com/pages/publications/105011040149
U2 - 10.1109/INFOCOM55648.2025.11044699
DO - 10.1109/INFOCOM55648.2025.11044699
M3 - 会议稿件
AN - SCOPUS:105011040149
T3 - Proceedings - IEEE INFOCOM
BT - INFOCOM 2025 - IEEE Conference on Computer Communications
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2025 IEEE Conference on Computer Communications, INFOCOM 2025
Y2 - 19 May 2025 through 22 May 2025
ER -