TY - JOUR
T1 - Novel hierarchical strategies for SPH-centric algorithms on GPGPU
AU - Huang, Kemeng
AU - Zhao, Zipeng
AU - Li, Chen
AU - Wang, Changbo
AU - Qin, Hong
N1 - Publisher Copyright:
© 2020 Elsevier Inc.
PY - 2020/9
Y1 - 2020/9
N2 - This research further extends an existing state-of-the-art GPU framework of SPH simulation for better performance without any compromise of numerical accuracy and sacrifice of simulation detail. Towards this grand goal, we devise three new strategies. First, we design a novel hierarchical grid to decompose the simulation space, where we can locate particles in a more refined unit space. With this organization, particle distribution coherence in global memory is improved. As a result, global memory operations can be more efficient. Second, based on our well designed hierarchical grid, we propose a hierarchical neighbor search strategy for catering to the heterogeneous distribution of particles in cells, so we can search a neighbor particle in different-level grids. In a cell with dense particles, we can search particles in a higher-level grid whose unit space is more refined, which means we can narrow the search space for decreasing the access of false neighbor particles. In contrast, in a cell with sparse particles, we search particles in a lower-level grid whose unit space is relatively large, which means we can decrease the loop iterations when we search the entire neighbor cell. In this way, we can avoid the unnecessary overhead of loop iterations. After designing a reasonable neighbor search strategy, an efficient thread cooperation strategy can further improve the performance of our framework by realizing more potentials of GPGPU. The existing state-of-the-art method is concentrating on task assignment strategy for taking the full advantage of shared memory. The method decomposes particles in a cell into several tasks and then assigns a cooperation thread array (CTA) to each task. However, this method has not fully considered the uniformity of tasks belonging to the same cell, as these tasks always share a lot of particles with the same neighborhood. Finally, we propose a hierarchical task assignment strategy by merging such successive tasks into a larger task, which means we only load the same neighbor particles once and the corresponding CTAs are arranged to work together to handle the new task. Our method can greatly reduce the overload of neighbor particles and the overhead of neighbor search iterations. Through our comprehensive tests validated in practice, our work can exhibit 1.73× speedup when compared with other state-of-the-art GPGPU frameworks for SPH simulation.
AB - This research further extends an existing state-of-the-art GPU framework of SPH simulation for better performance without any compromise of numerical accuracy and sacrifice of simulation detail. Towards this grand goal, we devise three new strategies. First, we design a novel hierarchical grid to decompose the simulation space, where we can locate particles in a more refined unit space. With this organization, particle distribution coherence in global memory is improved. As a result, global memory operations can be more efficient. Second, based on our well designed hierarchical grid, we propose a hierarchical neighbor search strategy for catering to the heterogeneous distribution of particles in cells, so we can search a neighbor particle in different-level grids. In a cell with dense particles, we can search particles in a higher-level grid whose unit space is more refined, which means we can narrow the search space for decreasing the access of false neighbor particles. In contrast, in a cell with sparse particles, we search particles in a lower-level grid whose unit space is relatively large, which means we can decrease the loop iterations when we search the entire neighbor cell. In this way, we can avoid the unnecessary overhead of loop iterations. After designing a reasonable neighbor search strategy, an efficient thread cooperation strategy can further improve the performance of our framework by realizing more potentials of GPGPU. The existing state-of-the-art method is concentrating on task assignment strategy for taking the full advantage of shared memory. The method decomposes particles in a cell into several tasks and then assigns a cooperation thread array (CTA) to each task. However, this method has not fully considered the uniformity of tasks belonging to the same cell, as these tasks always share a lot of particles with the same neighborhood. Finally, we propose a hierarchical task assignment strategy by merging such successive tasks into a larger task, which means we only load the same neighbor particles once and the corresponding CTAs are arranged to work together to handle the new task. Our method can greatly reduce the overload of neighbor particles and the overhead of neighbor search iterations. Through our comprehensive tests validated in practice, our work can exhibit 1.73× speedup when compared with other state-of-the-art GPGPU frameworks for SPH simulation.
KW - Hierarchical Perpendicular Grid (HPG)
KW - Hierarchical task assignment
KW - Neighbor search
KW - Smoothed Particle Hydrodynamics (SPH)
UR - https://www.scopus.com/pages/publications/85089560814
U2 - 10.1016/j.gmod.2020.101088
DO - 10.1016/j.gmod.2020.101088
M3 - 文章
AN - SCOPUS:85089560814
SN - 1524-0703
VL - 111
JO - Graphical Models
JF - Graphical Models
M1 - 101088
ER -