大数据处理系统中面向GPU 加速DNN 推理的模型共享

Translated title of the contribution: Model sharing for GPU-accelerated DNN inference in big data processing systems

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Big data processing is being widely used in academia and industry to handle DNN-based inference workloads for fields such as video analyses. In such cases, multiple parallel inference tasks in the big data processing system repeatedly load the same, read-only DNN model so the system does not fully utilize the GPU resources which creates a bottleneck that limits the inference performance. This paper presents a model sharing technique for single GPU cards that enables sharing of the same model among various DNN inference tasks. An allocator is used to make the model sharing technique work for each GPU in the distributed environment. This method was implemented in Spark on a GPU platform in a distributed data processing system that supports large-scale inference workloads. Tests show that for video analyses on the YOLO-v3 model, the model sharing reduces the GPU memory overhead and improves system throughput by up to 136% compared to a system without the model sharing technique.

Translated title of the contributionModel sharing for GPU-accelerated DNN inference in big data processing systems
Original languageChinese (Traditional)
Pages (from-to)1435-1441
Number of pages7
JournalQinghua Daxue Xuebao/Journal of Tsinghua University
Volume62
Issue number9
DOIs
StatePublished - 15 Sep 2022

Fingerprint

Dive into the research topics of 'Model sharing for GPU-accelerated DNN inference in big data processing systems'. Together they form a unique fingerprint.

Cite this