XeFlow: Streamlining Inter-Processor Pipeline Execution for the Discrete CPU-GPU Platform

Zhifang Li, Beicheng Peng, Chuliang Weng

Research output: Contribution to journalArticlepeer-review

6 Scopus citations

Abstract

Nowadays, GPUs have achieved high throughput computing by running plenty of threads. However, owing to disjoint memory spaces of discrete CPU-GPU systems, exploiting CPU and GPU within a data processing pipeline is a non-trivial issue, which can only be resolved by the coarse-grained workflow of 'copy-kernel-copy' or its variants in essence. There is an underlying bottleneck caused by frequent inter-processor invocations for fine-grained batch sizes. This article presents XeFlow that enables streamlined execution by leveraging hardware mechanisms inside new generation GPUs. XeFlow significantly reduces costly explicit copy and kernel launching within existing fashions. As an alternative, XeFlow introduces persistent operators that continuously process data through shared topics, which establish efficient inter-processor data channels via hardware page faults. Compared with the default 'copy-kernel-copy' method, XeFlow shows up to 2.4times sim 3.1times2.4×∼3.1× performance advantages in both coarse-grained and fine-grained pipeline execution. To demonstrate its potentials, this article also evaluates two GPU-accelerated applications, including data encoding and OLAP query.

Original languageEnglish
Article number8964470
Pages (from-to)819-831
Number of pages13
JournalIEEE Transactions on Computers
Volume69
Issue number6
DOIs
StatePublished - 1 Jun 2020

Keywords

  • CPU-GPU programming
  • GPU scheduling
  • heterogeneous memory system

Fingerprint

Dive into the research topics of 'XeFlow: Streamlining Inter-Processor Pipeline Execution for the Discrete CPU-GPU Platform'. Together they form a unique fingerprint.

Cite this