TY - GEN
T1 - AMPS-Inf
T2 - 50th International Conference on Parallel Processing, ICPP 2021
AU - Jarachanthan, Jananie
AU - Chen, Li
AU - Xu, Fei
AU - Li, Bo
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/8/9
Y1 - 2021/8/9
N2 - The salient pay-per-use nature of serverless computing has driven its continuous penetration as an alternative computing paradigm for various workloads. Yet, challenges arise and remain open when shifting machine learning workloads to the serverless environment. Specifically, the restriction on the deployment size over serverless platforms combining with the complexity of neural network models makes it difficult to deploy large models in a single serverless function. In this paper, we aim to fully exploit the advantages of the serverless computing paradigm for machine learning workloads targeting at mitigating management and overall cost while meeting the response-time Service Level Objective (SLO). We design and implement AMPS-Inf, an autonomous framework customized for model inferencing in serverless computing. Driven by the cost-efficiency and timely-response, our proposed AMPS-Inf automatically generates the optimal execution and resource provisioning plans for inference workloads. The core of AMPS-Inf relies on the formulation and solution of a Mixed-Integer Quadratic Programming problem for model partitioning and resource provisioning with the objective of minimizing cost without violating response time SLO. We deploy AMPS-Inf on the AWS Lambda platform, evaluate with the state-of-the-art pre-trained models in Keras including ResNet50, Inception-V3 and Xception, and compare with Amazon SageMaker and three baselines. Experimental results demonstrate that AMPS-Inf achieves up to 98% cost saving without degrading response time performance.
AB - The salient pay-per-use nature of serverless computing has driven its continuous penetration as an alternative computing paradigm for various workloads. Yet, challenges arise and remain open when shifting machine learning workloads to the serverless environment. Specifically, the restriction on the deployment size over serverless platforms combining with the complexity of neural network models makes it difficult to deploy large models in a single serverless function. In this paper, we aim to fully exploit the advantages of the serverless computing paradigm for machine learning workloads targeting at mitigating management and overall cost while meeting the response-time Service Level Objective (SLO). We design and implement AMPS-Inf, an autonomous framework customized for model inferencing in serverless computing. Driven by the cost-efficiency and timely-response, our proposed AMPS-Inf automatically generates the optimal execution and resource provisioning plans for inference workloads. The core of AMPS-Inf relies on the formulation and solution of a Mixed-Integer Quadratic Programming problem for model partitioning and resource provisioning with the objective of minimizing cost without violating response time SLO. We deploy AMPS-Inf on the AWS Lambda platform, evaluate with the state-of-the-art pre-trained models in Keras including ResNet50, Inception-V3 and Xception, and compare with Amazon SageMaker and three baselines. Experimental results demonstrate that AMPS-Inf achieves up to 98% cost saving without degrading response time performance.
KW - cost efficiency
KW - machine learning inference
KW - serverless computing
UR - https://www.scopus.com/pages/publications/85117250730
U2 - 10.1145/3472456.3472501
DO - 10.1145/3472456.3472501
M3 - 会议稿件
AN - SCOPUS:85117250730
T3 - ACM International Conference Proceeding Series
BT - 50th International Conference on Parallel Processing, ICPP 2021 - Main Conference Proceedings
PB - Association for Computing Machinery
Y2 - 9 August 2021 through 12 August 2021
ER -