HarmonyBatch: Batching multi-SLO DNN Inference with Heterogeneous Serverless Functions

  • Jiabin Chen
  • , Fei Xu*
  • , Yikun Gu
  • , Li Chen
  • , Fangming Liu
  • , Zhi Zhou
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

Deep Neural Network (DNN) inference on serverless functions is gaining prominence due to its potential for substantial budget savings. Existing works on serverless DNN inference solely optimize batching requests from one application with a single Service Level Objective (SLO) on CPU functions. However, production serverless DNN inference traces indicate that the request arrival rate of applications is surprisingly low, which inevitably causes a long batching time and SLO violations. Hence, there is an urgent need for batching multiple DNN inference requests with diverse SLOs (i.e., multi-SLO DNN inference) in serverless platforms. Moreover, the potential performance and cost benefits of deploying heterogeneous (i.e., CPU and GPU) functions for DNN inference have received scant attention.In this paper, we present HarmonyBatch, a cost-efficient resource provisioning framework designed to achieve predictable performance for multi-SLO DNN inference with heterogeneous serverless functions. Specifically, we construct an analytical performance and cost model of DNN inference on both CPU and GPU functions, by explicitly considering the GPU time-slicing scheduling mechanism and request arrival rate distribution. Based on such a model, we devise a two-stage merging strategy in HarmonyBatch to judiciously batch the multi-SLO DNN inference requests into application groups. It aims to minimize the budget of function provisioning for each application group while guaranteeing diverse performance SLOs of inference applications. We have implemented a prototype of HarmonyBatch on Alibaba Cloud Function Compute. Extensive prototype experiments with representative DNN inference workloads demonstrate that HarmonyBatch can provide predictable performance to serverless DNN inference workloads while reducing the monetary cost by up to 82.9% compared to the state-of-the-art methods.

Original languageEnglish
Title of host publication2024 IEEE/ACM 32nd International Symposium on Quality of Service, IWQoS 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350350128
DOIs
StatePublished - 2024
Event32nd IEEE/ACM International Symposium on Quality of Service, IWQoS 2024 - Guangzhou, China
Duration: 19 Jun 202421 Jun 2024

Publication series

NameIEEE International Workshop on Quality of Service, IWQoS
ISSN (Print)1548-615X

Conference

Conference32nd IEEE/ACM International Symposium on Quality of Service, IWQoS 2024
Country/TerritoryChina
CityGuangzhou
Period19/06/2421/06/24

Keywords

  • DNN inference
  • SLO guarantee
  • resource provisioning
  • serverless computing

Fingerprint

Dive into the research topics of 'HarmonyBatch: Batching multi-SLO DNN Inference with Heterogeneous Serverless Functions'. Together they form a unique fingerprint.

Cite this