TY - GEN
T1 - Espresso
T2 - 2025 IEEE Conference on Computer Communications, INFOCOM 2025
AU - Zhou, Qiannan
AU - Xu, Fei
AU - Weng, Lingxuan
AU - Li, Ruixing
AU - Wu, Xudong
AU - Chen, Li
AU - Zhou, Zhi
AU - Liu, Fangming
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - As Transformer-based models deepen and datasets expand, training large models demands numerous accelerators, particularly GPUs, bringing high cloud expenses. However, conventional homogeneous resource provisioning is inefficient due to limited cloud resources and low GPU utilization. This challenge necessitates heterogeneous GPU provisioning for training in clouds. Current research on large model training often focuses on load balancing of stages, neglecting the varying computing and memory demands across stages. Additionally, the allocation of heterogeneous G PU s for training has surprisingly received little attention. This paper introduces Espresso, a cost-efficient GPU provisioning framework that unifies the heterogeneous GPU allocation (GPU allocator) and adequate stage placement (stage placer) for large model training in the cloud. Specifically, the GPU allocator proposes a cost tree-based provisioning strategy to prioritize searching allocation plans with lower costs and reduce unnecessary branches by multi-dimensional pruning methods. The resource-aware stage placer further devises a compute-memory ratio to optimize communication and computation efficiency during training. We have open-sourced a prototype of Espresso and conducted prototype experiments on four representative large models in public clouds. Extensive experiment results demonstrate that Espresso guarantees the performance for large model training while saving costs by up to 49.8 % compared to state-of-the-art solutions, yet with acceptable runtime overhead.
AB - As Transformer-based models deepen and datasets expand, training large models demands numerous accelerators, particularly GPUs, bringing high cloud expenses. However, conventional homogeneous resource provisioning is inefficient due to limited cloud resources and low GPU utilization. This challenge necessitates heterogeneous GPU provisioning for training in clouds. Current research on large model training often focuses on load balancing of stages, neglecting the varying computing and memory demands across stages. Additionally, the allocation of heterogeneous G PU s for training has surprisingly received little attention. This paper introduces Espresso, a cost-efficient GPU provisioning framework that unifies the heterogeneous GPU allocation (GPU allocator) and adequate stage placement (stage placer) for large model training in the cloud. Specifically, the GPU allocator proposes a cost tree-based provisioning strategy to prioritize searching allocation plans with lower costs and reduce unnecessary branches by multi-dimensional pruning methods. The resource-aware stage placer further devises a compute-memory ratio to optimize communication and computation efficiency during training. We have open-sourced a prototype of Espresso and conducted prototype experiments on four representative large models in public clouds. Extensive experiment results demonstrate that Espresso guarantees the performance for large model training while saving costs by up to 49.8 % compared to state-of-the-art solutions, yet with acceptable runtime overhead.
KW - Large model training
KW - heterogeneous GPU environments
KW - resource provisioning
KW - stage placement
UR - https://www.scopus.com/pages/publications/105011091234
U2 - 10.1109/INFOCOM55648.2025.11044693
DO - 10.1109/INFOCOM55648.2025.11044693
M3 - 会议稿件
AN - SCOPUS:105011091234
T3 - Proceedings - IEEE INFOCOM
BT - INFOCOM 2025 - IEEE Conference on Computer Communications
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 19 May 2025 through 22 May 2025
ER -