Learning Cooperative Oversubscription for Cloud by Chance-Constrained Multi-Agent Reinforcement Learning

Junjie Sheng, Lu Wang, Fangkai Yang, Bo Qiao, Hang Dong, Xiangfeng Wang*, Bo Jin, Jun Wang, Si Qin, Saravan Rajmohan, Qingwei Lin*, Dongmei Zhang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

Oversubscription is a common practice for improving cloud resource utilization. It allows the cloud service provider to sell more resources than the physical limit, assuming not all users would fully utilize the resources simultaneously. However, how to design an oversubscription policy that improves utilization while satisfying some safety constraints remains an open problem. Existing methods and industrial practices are over-conservative, ignoring the coordination of diverse resource usage patterns and probabilistic constraints. To address these two limitations, this paper formulates the oversubscription for cloud as a chance-constrained optimization problem and proposes an effective Chance-Constrained Multi-Agent Reinforcement Learning (C2MARL) method to solve this problem. Specifically, C2MARL reduces the number of constraints by considering their upper bounds and leverages a multi-agent reinforcement learning paradigm to learn a safe and optimal coordination policy. We evaluate our C2MARL on an internal cloud platform and public cloud datasets. Experiments show that our C2MARL outperforms existing methods in improving utilization () under different levels of safety constraints.

Original languageEnglish
Title of host publicationACM Web Conference 2023 - Proceedings of the World Wide Web Conference, WWW 2023
PublisherAssociation for Computing Machinery, Inc
Pages2927-2936
Number of pages10
ISBN (Electronic)9781450394161
DOIs
StatePublished - 30 Apr 2023
Event32nd ACM World Wide Web Conference, WWW 2023 - Austin, United States
Duration: 30 Apr 20234 May 2023

Publication series

NameACM Web Conference 2023 - Proceedings of the World Wide Web Conference, WWW 2023

Conference

Conference32nd ACM World Wide Web Conference, WWW 2023
Country/TerritoryUnited States
CityAustin
Period30/04/234/05/23

Keywords

  • Cloud Computing
  • Multi-Agent System
  • Over Subscription
  • Reinforcement Learning

Fingerprint

Dive into the research topics of 'Learning Cooperative Oversubscription for Cloud by Chance-Constrained Multi-Agent Reinforcement Learning'. Together they form a unique fingerprint.

Cite this