跳到主要导航 跳到搜索 跳到主要内容

AMPS-Inf: Automatic Model Partitioning for Serverless Inference with Cost Efficiency

  • Jananie Jarachanthan
  • , Li Chen
  • , Fei Xu
  • , Bo Li

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

The salient pay-per-use nature of serverless computing has driven its continuous penetration as an alternative computing paradigm for various workloads. Yet, challenges arise and remain open when shifting machine learning workloads to the serverless environment. Specifically, the restriction on the deployment size over serverless platforms combining with the complexity of neural network models makes it difficult to deploy large models in a single serverless function. In this paper, we aim to fully exploit the advantages of the serverless computing paradigm for machine learning workloads targeting at mitigating management and overall cost while meeting the response-time Service Level Objective (SLO). We design and implement AMPS-Inf, an autonomous framework customized for model inferencing in serverless computing. Driven by the cost-efficiency and timely-response, our proposed AMPS-Inf automatically generates the optimal execution and resource provisioning plans for inference workloads. The core of AMPS-Inf relies on the formulation and solution of a Mixed-Integer Quadratic Programming problem for model partitioning and resource provisioning with the objective of minimizing cost without violating response time SLO. We deploy AMPS-Inf on the AWS Lambda platform, evaluate with the state-of-the-art pre-trained models in Keras including ResNet50, Inception-V3 and Xception, and compare with Amazon SageMaker and three baselines. Experimental results demonstrate that AMPS-Inf achieves up to 98% cost saving without degrading response time performance.

源语言英语
主期刊名50th International Conference on Parallel Processing, ICPP 2021 - Main Conference Proceedings
出版商Association for Computing Machinery
ISBN(电子版)9781450390682
DOI
出版状态已出版 - 9 8月 2021
活动50th International Conference on Parallel Processing, ICPP 2021 - Virtual, Online, 美国
期限: 9 8月 202112 8月 2021

出版系列

姓名ACM International Conference Proceeding Series

会议

会议50th International Conference on Parallel Processing, ICPP 2021
国家/地区美国
Virtual, Online
时期9/08/2112/08/21

指纹

探究 'AMPS-Inf: Automatic Model Partitioning for Serverless Inference with Cost Efficiency' 的科研主题。它们共同构成独一无二的指纹。

引用此