TY - JOUR
T1 - XeFlow
T2 - Streamlining Inter-Processor Pipeline Execution for the Discrete CPU-GPU Platform
AU - Li, Zhifang
AU - Peng, Beicheng
AU - Weng, Chuliang
N1 - Publisher Copyright:
© 1968-2012 IEEE.
PY - 2020/6/1
Y1 - 2020/6/1
N2 - Nowadays, GPUs have achieved high throughput computing by running plenty of threads. However, owing to disjoint memory spaces of discrete CPU-GPU systems, exploiting CPU and GPU within a data processing pipeline is a non-trivial issue, which can only be resolved by the coarse-grained workflow of 'copy-kernel-copy' or its variants in essence. There is an underlying bottleneck caused by frequent inter-processor invocations for fine-grained batch sizes. This article presents XeFlow that enables streamlined execution by leveraging hardware mechanisms inside new generation GPUs. XeFlow significantly reduces costly explicit copy and kernel launching within existing fashions. As an alternative, XeFlow introduces persistent operators that continuously process data through shared topics, which establish efficient inter-processor data channels via hardware page faults. Compared with the default 'copy-kernel-copy' method, XeFlow shows up to 2.4times sim 3.1times2.4×∼3.1× performance advantages in both coarse-grained and fine-grained pipeline execution. To demonstrate its potentials, this article also evaluates two GPU-accelerated applications, including data encoding and OLAP query.
AB - Nowadays, GPUs have achieved high throughput computing by running plenty of threads. However, owing to disjoint memory spaces of discrete CPU-GPU systems, exploiting CPU and GPU within a data processing pipeline is a non-trivial issue, which can only be resolved by the coarse-grained workflow of 'copy-kernel-copy' or its variants in essence. There is an underlying bottleneck caused by frequent inter-processor invocations for fine-grained batch sizes. This article presents XeFlow that enables streamlined execution by leveraging hardware mechanisms inside new generation GPUs. XeFlow significantly reduces costly explicit copy and kernel launching within existing fashions. As an alternative, XeFlow introduces persistent operators that continuously process data through shared topics, which establish efficient inter-processor data channels via hardware page faults. Compared with the default 'copy-kernel-copy' method, XeFlow shows up to 2.4times sim 3.1times2.4×∼3.1× performance advantages in both coarse-grained and fine-grained pipeline execution. To demonstrate its potentials, this article also evaluates two GPU-accelerated applications, including data encoding and OLAP query.
KW - CPU-GPU programming
KW - GPU scheduling
KW - heterogeneous memory system
UR - https://www.scopus.com/pages/publications/85078447744
U2 - 10.1109/TC.2020.2968302
DO - 10.1109/TC.2020.2968302
M3 - 文章
AN - SCOPUS:85078447744
SN - 0018-9340
VL - 69
SP - 819
EP - 831
JO - IEEE Transactions on Computers
JF - IEEE Transactions on Computers
IS - 6
M1 - 8964470
ER -