Tetris: Proactive Container Scheduling for Long-Term Load Balancing in Shared Clusters

  • Fei Xu*
  • , Xiyue Shen
  • , Shuohao Lin
  • , Li Chen
  • , Zhi Zhou
  • , Fen Xiao
  • , Fangming Liu
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

10 Scopus citations

Abstract

Long-running containerized workloads (e.g., machine learning), which typically show time-varying patterns, are increasingly prevailing in shared production clusters. To improve workload performance, current schedulers mainly focus on optimizing short-term benefits of cluster load balancing or initial container placement on servers. However, this would inevitably bring many invalid migrations (i.e., containers are migrated back and forth among servers over a short time window), leading to significant service level objective (SLO) violations. This paper introduces Tetris, a model predictive control (MPC)-based container scheduling strategy to proactively migrate long-running workloads for cluster load balancing. Specifically, we first build a discrete-time dynamic model for long-term optimization of container scheduling. To solve such an optimization problem, Tetris then employs two main components: (1) a container resource predictor, which leverages time-series analysis approaches to accurately predict the container resource consumption; (2) an MPC-based container scheduler that jointly optimizes the cluster load balancing and container migration cost over a certain sliding time window. We implement and open source a prototype of Tetris based on K8s. Extensive prototype experiments and trace-driven simulations demonstrate that Tetris can improve the cluster load balancing degree by up to 77.8% without incurring any SLO violations, compared to the state-of-the-art container scheduling strategies.

Original languageEnglish
Pages (from-to)2918-2930
Number of pages13
JournalIEEE Transactions on Services Computing
Volume17
Issue number5
DOIs
StatePublished - 2024

Keywords

  • Container scheduling
  • load balancing
  • long-running containerized workloads
  • migration cost

Fingerprint

Dive into the research topics of 'Tetris: Proactive Container Scheduling for Long-Term Load Balancing in Shared Clusters'. Together they form a unique fingerprint.

Cite this