TY - GEN
T1 - ShadowVM
T2 - 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2021
AU - Li, Zhifang
AU - Han, Mingcong
AU - Wu, Shangwei
AU - Weng, Chuliang
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/2/17
Y1 - 2021/2/17
N2 - With the development of the big data ecosystem, large-scale data analytics has become more prevalent in the past few years. Apache Spark, etc., provide a flexible approach for scalable processing upon massive data. However, they are not designed for handling computing-intensive workloads due to the restrictions of JVM runtime. In contrast, GPU has been the de facto accelerator for graphics rendering and deep learning in recent years. Nevertheless, the current architecture makes it difficult to take advantage of GPUs and other accelerators in the big data world. Now, it is time to break down this obstacle by changing the fundamental architecture. To integrate accelerators efficiently, we decouple the control plane and the data plane within big data systems via action shadowing. The control plane keeps logic information to fit well with the host systems like Spark, while the data plane holds data and performs execution upon bare metal CPUs and GPUs. Under this decoupled architecture, both the control plane and the data plane could leverage the appropriate approaches without breaking existing mechanisms. Based on this idea, we implement an accelerated data plane, namely ShadowVM. In our experiments on the SSB benchmark, ShadowVM lifts the JVM-based Spark with up to 14.7× speedup. Furthermore, ShadowVM could also outperform the GPU-only fashion by adopting mixed CPU-GPU execution.
AB - With the development of the big data ecosystem, large-scale data analytics has become more prevalent in the past few years. Apache Spark, etc., provide a flexible approach for scalable processing upon massive data. However, they are not designed for handling computing-intensive workloads due to the restrictions of JVM runtime. In contrast, GPU has been the de facto accelerator for graphics rendering and deep learning in recent years. Nevertheless, the current architecture makes it difficult to take advantage of GPUs and other accelerators in the big data world. Now, it is time to break down this obstacle by changing the fundamental architecture. To integrate accelerators efficiently, we decouple the control plane and the data plane within big data systems via action shadowing. The control plane keeps logic information to fit well with the host systems like Spark, while the data plane holds data and performs execution upon bare metal CPUs and GPUs. Under this decoupled architecture, both the control plane and the data plane could leverage the appropriate approaches without breaking existing mechanisms. Based on this idea, we implement an accelerated data plane, namely ShadowVM. In our experiments on the SSB benchmark, ShadowVM lifts the JVM-based Spark with up to 14.7× speedup. Furthermore, ShadowVM could also outperform the GPU-only fashion by adopting mixed CPU-GPU execution.
KW - GPU
KW - big data processing
KW - heterogeneous system
UR - https://www.scopus.com/pages/publications/85101700787
U2 - 10.1145/3437801.3441595
DO - 10.1145/3437801.3441595
M3 - 会议稿件
AN - SCOPUS:85101700787
T3 - Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP
SP - 147
EP - 160
BT - PPoPP 2021 - Proceedings of the 2021 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
PB - Association for Computing Machinery
Y2 - 27 February 2021 through 3 March 2021
ER -