Accelerating Recommendation Inference via GPU Streams

  • Yuean Niu
  • , Zhizhen Xu
  • , Chen Xu*
  • , Jiaqiang Wang
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

Deep Learning based recommendation is common in various recommendation services and widely used in the industry. To predict user preferences accurately, state-of-the-art recommendation models contain an increasing number of features and various methods of feature interaction, which both lengthen inference time. We observe that the embedding lookup and feature interaction of different features in a recommendation model is independent of each other. However, current deep learning frameworks (e.g., TensorFlow, PyTorch) are oblivious to this independence, and schedule the operators to execute sequentially in a single computational stream. In this work, we exploit multiple CUDA streams to parallelize the execution of embedding lookup and feature interaction. To further overlap the processing of different sparse features and minimize synchronization overhead, we propose a topology-aware operator assignment algorithm to schedule operators to computational streams. We implement a prototype, namely StreamRec, based on TensorFlow XLA. Our experiments show that StreamRec is able to reduce latency by up to 27.8% and increase throughput by up to 52% in comparison to the original TensorFlow XLA.

Original languageEnglish
Title of host publicationDatabase Systems for Advanced Applications - 28th International Conference, DASFAA 2023, Proceedings
EditorsXin Wang, Maria Luisa Sapino, Wook-Shin Han, Amr El Abbadi, Gill Dobbie, Zhiyong Feng, Yingxiao Shao, Hongzhi Yin
PublisherSpringer Science and Business Media Deutschland GmbH
Pages546-561
Number of pages16
ISBN (Print)9783031306365
DOIs
StatePublished - 2023
Event28th International Conference on Database Systems for Advanced Applications, DASFAA 2023 - Tianjin, China
Duration: 17 Apr 202320 Apr 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13943 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference28th International Conference on Database Systems for Advanced Applications, DASFAA 2023
Country/TerritoryChina
CityTianjin
Period17/04/2320/04/23

Keywords

  • CUDA Stream
  • Inference Service
  • Operator Assignment
  • Parallelization
  • Recommendation model

Fingerprint

Dive into the research topics of 'Accelerating Recommendation Inference via GPU Streams'. Together they form a unique fingerprint.

Cite this