TY - GEN
T1 - Control Flow Divergence Optimization by Exploiting Tensor Cores
AU - Pang, Weiguang
AU - Jiang, Xu
AU - Liu, Songran
AU - Qiao, Lei
AU - Fu, Kexue
AU - Gao, Longxiang
AU - Yi, Wang
N1 - Publisher Copyright:
© 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.
PY - 2024/11/7
Y1 - 2024/11/7
N2 - Kernels are scheduled on Graphics Processing Units (GPUs) in the granularity of GPU warp, which is a bunch of threads that must be scheduled together. When executing kernels with conditional branches, the threads within a warp may execute different branches sequentially, resulting in a considerable utilization loss and unpredictable execution time. This problem is known as the control flow divergence. In this work, we propose a novel method to predict threads' execution path before the launch of the kernel by deploying a branch prediction network on the GPU's tensor cores, which can efficiently parallel run with the kernels on CUDA cores, so that the divergence problem can be eased in a large extent with the lowest overhead. Combined with a well-designed thread data reorganization algorithm, this solution can better mitigate GPUs' control flow divergence problem.
AB - Kernels are scheduled on Graphics Processing Units (GPUs) in the granularity of GPU warp, which is a bunch of threads that must be scheduled together. When executing kernels with conditional branches, the threads within a warp may execute different branches sequentially, resulting in a considerable utilization loss and unpredictable execution time. This problem is known as the control flow divergence. In this work, we propose a novel method to predict threads' execution path before the launch of the kernel by deploying a branch prediction network on the GPU's tensor cores, which can efficiently parallel run with the kernels on CUDA cores, so that the divergence problem can be eased in a large extent with the lowest overhead. Combined with a well-designed thread data reorganization algorithm, this solution can better mitigate GPUs' control flow divergence problem.
UR - https://www.scopus.com/pages/publications/85211162720
U2 - 10.1145/3649329.3658462
DO - 10.1145/3649329.3658462
M3 - 会议稿件
AN - SCOPUS:85211162720
T3 - Proceedings - Design Automation Conference
BT - Proceedings of the 61st ACM/IEEE Design Automation Conference, DAC 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 61st ACM/IEEE Design Automation Conference, DAC 2024
Y2 - 23 June 2024 through 27 June 2024
ER -