Multi-Task Reinforcement Learning for Collaborative Network Optimization in Data Centers

Ting Wang, Kai Cheng, Xiao Du

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

As data center networks increasingly grow in complexity and scale, efficiently managing traffic scheduling and congestion control becomes crucial for optimizing network performance. Traditional single-task optimization strategies often fall short, failing to adequately address the interplay between different tasks and resulting in suboptimal performance with inefficiencies and robustness issues. To tackle these challenges, this paper proposes a novel Multi-Task Reinforcement Learning (MTRL)-based collaborative Network Optimization scheme, termed MTRLNO, which establishes a structured framework with central and edge systems (i.e., hosts and switches). The SDN-enabled central system incorporates an MTRL agent that simultaneously optimizes traffic scheduling and congestion control tasks, leveraging global network state information to formulate instructive optimization policies for edge systems. Switches implement decentralized multi-agent RL agents to facilitate automatic ECN tuning for congestion control, with the ability to handle incast issues. Hosts feature an MTRL-guided Multiple Level Feedback Queue (MLFQ) demotion threshold adjustment scheme for adaptive traffic scheduling. We further develop a Prioritized Experience Replay-based Soft Actor-Critic (PERSAC) algorithm to enhance learning efficiency and a customized multi-task learning algorithm via improved parameter-sharing to effectively adapt across multiple tasks. Experimental results demonstrate that MTRLNO significantly outperforms state-of-the-art approaches in terms of FCT, latency, and robustness across diverse network conditions.

Original languageEnglish
Title of host publicationINFOCOM 2025 - IEEE Conference on Computer Communications
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798331543051
DOIs
StatePublished - 2025
Event2025 IEEE Conference on Computer Communications, INFOCOM 2025 - London, United Kingdom
Duration: 19 May 202522 May 2025

Publication series

NameProceedings - IEEE INFOCOM
ISSN (Print)0743-166X

Conference

Conference2025 IEEE Conference on Computer Communications, INFOCOM 2025
Country/TerritoryUnited Kingdom
CityLondon
Period19/05/2522/05/25

Fingerprint

Dive into the research topics of 'Multi-Task Reinforcement Learning for Collaborative Network Optimization in Data Centers'. Together they form a unique fingerprint.

Cite this