Espresso: Cost-Efficient Large Model Training by Exploiting GPU Heterogeneity in the Cloud

  • Qiannan Zhou
  • , Fei Xu*
  • , Lingxuan Weng
  • , Ruixing Li
  • , Xudong Wu
  • , Li Chen
  • , Zhi Zhou
  • , Fangming Liu
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

As Transformer-based models deepen and datasets expand, training large models demands numerous accelerators, particularly GPUs, bringing high cloud expenses. However, conventional homogeneous resource provisioning is inefficient due to limited cloud resources and low GPU utilization. This challenge necessitates heterogeneous GPU provisioning for training in clouds. Current research on large model training often focuses on load balancing of stages, neglecting the varying computing and memory demands across stages. Additionally, the allocation of heterogeneous G PU s for training has surprisingly received little attention. This paper introduces Espresso, a cost-efficient GPU provisioning framework that unifies the heterogeneous GPU allocation (GPU allocator) and adequate stage placement (stage placer) for large model training in the cloud. Specifically, the GPU allocator proposes a cost tree-based provisioning strategy to prioritize searching allocation plans with lower costs and reduce unnecessary branches by multi-dimensional pruning methods. The resource-aware stage placer further devises a compute-memory ratio to optimize communication and computation efficiency during training. We have open-sourced a prototype of Espresso and conducted prototype experiments on four representative large models in public clouds. Extensive experiment results demonstrate that Espresso guarantees the performance for large model training while saving costs by up to 49.8 % compared to state-of-the-art solutions, yet with acceptable runtime overhead.

Original languageEnglish
Title of host publicationINFOCOM 2025 - IEEE Conference on Computer Communications
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798331543051
DOIs
StatePublished - 2025
Event2025 IEEE Conference on Computer Communications, INFOCOM 2025 - London, United Kingdom
Duration: 19 May 202522 May 2025

Publication series

NameProceedings - IEEE INFOCOM
ISSN (Print)0743-166X

Conference

Conference2025 IEEE Conference on Computer Communications, INFOCOM 2025
Country/TerritoryUnited Kingdom
CityLondon
Period19/05/2522/05/25

Keywords

  • Large model training
  • heterogeneous GPU environments
  • resource provisioning
  • stage placement

Fingerprint

Dive into the research topics of 'Espresso: Cost-Efficient Large Model Training by Exploiting GPU Heterogeneity in the Cloud'. Together they form a unique fingerprint.

Cite this