Control Flow Divergence Optimization by Exploiting Tensor Cores

  • Weiguang Pang
  • , Xu Jiang*
  • , Songran Liu
  • , Lei Qiao
  • , Kexue Fu
  • , Longxiang Gao
  • , Wang Yi
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Kernels are scheduled on Graphics Processing Units (GPUs) in the granularity of GPU warp, which is a bunch of threads that must be scheduled together. When executing kernels with conditional branches, the threads within a warp may execute different branches sequentially, resulting in a considerable utilization loss and unpredictable execution time. This problem is known as the control flow divergence. In this work, we propose a novel method to predict threads' execution path before the launch of the kernel by deploying a branch prediction network on the GPU's tensor cores, which can efficiently parallel run with the kernels on CUDA cores, so that the divergence problem can be eased in a large extent with the lowest overhead. Combined with a well-designed thread data reorganization algorithm, this solution can better mitigate GPUs' control flow divergence problem.

Original languageEnglish
Title of host publicationProceedings of the 61st ACM/IEEE Design Automation Conference, DAC 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798400706011
DOIs
StatePublished - 7 Nov 2024
Externally publishedYes
Event61st ACM/IEEE Design Automation Conference, DAC 2024 - San Francisco, United States
Duration: 23 Jun 202427 Jun 2024

Publication series

NameProceedings - Design Automation Conference
ISSN (Print)0738-100X

Conference

Conference61st ACM/IEEE Design Automation Conference, DAC 2024
Country/TerritoryUnited States
CitySan Francisco
Period23/06/2427/06/24

Fingerprint

Dive into the research topics of 'Control Flow Divergence Optimization by Exploiting Tensor Cores'. Together they form a unique fingerprint.

Cite this