跳到主要导航 跳到搜索 跳到主要内容

Accelerating Recommendation Inference via GPU Streams

  • Yuean Niu
  • , Zhizhen Xu
  • , Chen Xu*
  • , Jiaqiang Wang
  • *此作品的通讯作者
  • East China Normal University
  • Shanghai Engineering Research Center of Big Data Management
  • Tencent

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Deep Learning based recommendation is common in various recommendation services and widely used in the industry. To predict user preferences accurately, state-of-the-art recommendation models contain an increasing number of features and various methods of feature interaction, which both lengthen inference time. We observe that the embedding lookup and feature interaction of different features in a recommendation model is independent of each other. However, current deep learning frameworks (e.g., TensorFlow, PyTorch) are oblivious to this independence, and schedule the operators to execute sequentially in a single computational stream. In this work, we exploit multiple CUDA streams to parallelize the execution of embedding lookup and feature interaction. To further overlap the processing of different sparse features and minimize synchronization overhead, we propose a topology-aware operator assignment algorithm to schedule operators to computational streams. We implement a prototype, namely StreamRec, based on TensorFlow XLA. Our experiments show that StreamRec is able to reduce latency by up to 27.8% and increase throughput by up to 52% in comparison to the original TensorFlow XLA.

源语言英语
主期刊名Database Systems for Advanced Applications - 28th International Conference, DASFAA 2023, Proceedings
编辑Xin Wang, Maria Luisa Sapino, Wook-Shin Han, Amr El Abbadi, Gill Dobbie, Zhiyong Feng, Yingxiao Shao, Hongzhi Yin
出版商Springer Science and Business Media Deutschland GmbH
546-561
页数16
ISBN(印刷版)9783031306365
DOI
出版状态已出版 - 2023
活动28th International Conference on Database Systems for Advanced Applications, DASFAA 2023 - Tianjin, 中国
期限: 17 4月 202320 4月 2023

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
13943 LNCS
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议28th International Conference on Database Systems for Advanced Applications, DASFAA 2023
国家/地区中国
Tianjin
时期17/04/2320/04/23

指纹

探究 'Accelerating Recommendation Inference via GPU Streams' 的科研主题。它们共同构成独一无二的指纹。

引用此