D-Cubicle: boosting data transfer dynamically for large-scale analytical queries in single-GPU systems

Jialun Wang, Wenhao Pang, Chuliang Weng, Aoying Zhou

Research output: Contribution to journalArticlepeer-review

6 Scopus citations

Abstract

In analytical queries, a number of important operators like JOIN and GROUP BY are suitable for parallelization, and GPU is an ideal accelerator considering its power of parallel computing. However, when data size increases to hundreds of gigabytes, one GPU card becomes insufficient due to the small capacity of global memory and the slow data transfer between host and device. A straightforward solution is to equip more GPUs linked with high-bandwidth connectors, but the cost will be highly increased. We utilize unified memory (UM) produced by NVIDIA CUDA (Compute Unified Device Architecture) to make it possible to accelerate large-scale queries on just one GPU, but we notice that the transfer performance between host and UM, which happens before kernel execution, is often significantly slower than the theoretical bandwidth. An important reason is that, in single-GPU environment, data processing systems usually invoke only one or a static number of threads for data copy, leading to an inefficient transfer which slows down the overall performance heavily. In this paper, we present D-Cubicle, a runtime module to accelerate data transfer between host-managed memory and unified memory. D-Cubicle boosts the actual transfer speed dynamically through a self-adaptive approach. In our experiments, taking data transfer into account, D-Cubicle processes 200 GB of data on a single GPU with 32 GB of global memory, achieving 1.43x averagely and 2.09x maximally the performance of the baseline system.

Original languageEnglish
Article number174610
JournalFrontiers of Computer Science
Volume17
Issue number4
DOIs
StatePublished - Aug 2023

Keywords

  • GPU
  • data analytics
  • unified memory

Fingerprint

Dive into the research topics of 'D-Cubicle: boosting data transfer dynamically for large-scale analytical queries in single-GPU systems'. Together they form a unique fingerprint.

Cite this