TY - GEN
T1 - Performance Modeling of Stencil Computation on SW26010 Processors
AU - Liu, Yao
AU - Liu, Li
AU - Hu, Mengtao
AU - Wang, Wei
AU - Xue, Wei
AU - Zhu, Qingting
N1 - Publisher Copyright:
© 2020, Springer Nature Switzerland AG.
PY - 2020
Y1 - 2020
N2 - Stencil computation is a basic part in a large variety of scientific computing programs, especially for those containing partial differential equations. Due to the limited memory bandwidth, it is a challenge to improve the parallel efficiency of stencil computation on modern supercomputers. Performance modeling is the most common method of performance analysis. In this paper, we propose the generic performance model based on Sunway TaihuLight which is powered by SW26010 heterogeneous many-core processors. The generic model indicates the interaction between the programs and the computing platform from the architecture perspective, and points out the performance bottlenecks of the programs from the optimization perspective. Furthermore, we propose the specific performance model of stencil computation on SW26010 processors, and optimize the performance of stencil computation under the guidance of the model. The experimental results show that the performance models proposed in this paper are effective—the average error ratio of the predicted performance is less than 7%. Guided by the specific model, the optimized stencil computation achieves better performance than the unoptimized many-core version by 154.71% on 4096 cores.
AB - Stencil computation is a basic part in a large variety of scientific computing programs, especially for those containing partial differential equations. Due to the limited memory bandwidth, it is a challenge to improve the parallel efficiency of stencil computation on modern supercomputers. Performance modeling is the most common method of performance analysis. In this paper, we propose the generic performance model based on Sunway TaihuLight which is powered by SW26010 heterogeneous many-core processors. The generic model indicates the interaction between the programs and the computing platform from the architecture perspective, and points out the performance bottlenecks of the programs from the optimization perspective. Furthermore, we propose the specific performance model of stencil computation on SW26010 processors, and optimize the performance of stencil computation under the guidance of the model. The experimental results show that the performance models proposed in this paper are effective—the average error ratio of the predicted performance is less than 7%. Guided by the specific model, the optimized stencil computation achieves better performance than the unoptimized many-core version by 154.71% on 4096 cores.
KW - Heterogeneous many-core processors
KW - Performance modeling
KW - Stencil computation
KW - Sunway TaihuLight
UR - https://www.scopus.com/pages/publications/85092646351
U2 - 10.1007/978-3-030-60245-1_27
DO - 10.1007/978-3-030-60245-1_27
M3 - 会议稿件
AN - SCOPUS:85092646351
SN - 9783030602444
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 386
EP - 400
BT - Algorithms and Architectures for Parallel Processing - 20th International Conference, ICA3PP 2020, Proceedings
A2 - Qiu, Meikang
PB - Springer Science and Business Media Deutschland GmbH
T2 - 20th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2020
Y2 - 2 October 2020 through 4 October 2020
ER -