ShadowVM: Accelerating data plane for data analytics with bare metal CPUs and GPUs

Zhifang Li, Mingcong Han, Shangwei Wu, Chuliang Weng

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

With the development of the big data ecosystem, large-scale data analytics has become more prevalent in the past few years. Apache Spark, etc., provide a flexible approach for scalable processing upon massive data. However, they are not designed for handling computing-intensive workloads due to the restrictions of JVM runtime. In contrast, GPU has been the de facto accelerator for graphics rendering and deep learning in recent years. Nevertheless, the current architecture makes it difficult to take advantage of GPUs and other accelerators in the big data world. Now, it is time to break down this obstacle by changing the fundamental architecture. To integrate accelerators efficiently, we decouple the control plane and the data plane within big data systems via action shadowing. The control plane keeps logic information to fit well with the host systems like Spark, while the data plane holds data and performs execution upon bare metal CPUs and GPUs. Under this decoupled architecture, both the control plane and the data plane could leverage the appropriate approaches without breaking existing mechanisms. Based on this idea, we implement an accelerated data plane, namely ShadowVM. In our experiments on the SSB benchmark, ShadowVM lifts the JVM-based Spark with up to 14.7× speedup. Furthermore, ShadowVM could also outperform the GPU-only fashion by adopting mixed CPU-GPU execution.

Original languageEnglish
Title of host publicationPPoPP 2021 - Proceedings of the 2021 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
PublisherAssociation for Computing Machinery
Pages147-160
Number of pages14
ISBN (Electronic)9781450382946
DOIs
StatePublished - 17 Feb 2021
Event26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2021 - Virtual, Online, Korea, Republic of
Duration: 27 Feb 20213 Mar 2021

Publication series

NameProceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP

Conference

Conference26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2021
Country/TerritoryKorea, Republic of
CityVirtual, Online
Period27/02/213/03/21

Keywords

  • GPU
  • big data processing
  • heterogeneous system

Fingerprint

Dive into the research topics of 'ShadowVM: Accelerating data plane for data analytics with bare metal CPUs and GPUs'. Together they form a unique fingerprint.

Cite this