TY - JOUR
T1 - Parallelization of the estuarine saltwater intrusion numerical forecast model UFDECOM-i using Fortran DO CONCURRENT
AU - Guo, Hongyuan
AU - Chen, Bingrui
AU - Ma, Rui
AU - Wang, Yihe
AU - Zhu, Jianrong
N1 - Publisher Copyright:
© 2026 Elsevier Ltd
PY - 2026/3
Y1 - 2026/3
N2 - High-resolution simulations of estuarine saltwater intrusion are computationally demanding and require efficient execution on heterogeneous computing platforms. In this study, the use of standard Fortran parallelization—DO CONCURRENT—to accelerate the unstructured quadrilateral grid finite-differencing estuarine and coastal ocean model (UFDECOM-i) within a unified codebase for both multicore CPUs and GPUs was investigated. Using the NVFORTRAN compiler, three versions were implemented: MC-UFDECOM-i on multicore CPUs, GPU-UFDECOM-i using automatic data migration, and GPUA-UFDECOM-i using lightweight OpenACC directives for explicit data management. The results show that DO CONCURRENT enables scalable shared-memory parallelism on CPUs, with speedups of up to 16.32 ×, and provides functional portability to GPUs without code modification. However, optimal GPU performance requires explicit data management, with GPUA-UFDECOM-i reaching a maximum speedup of 21.48 × . These results demonstrate that DO CONCURRENT ensures portability and maintainability, whereas explicit data control remains essential for high GPU efficiency.
AB - High-resolution simulations of estuarine saltwater intrusion are computationally demanding and require efficient execution on heterogeneous computing platforms. In this study, the use of standard Fortran parallelization—DO CONCURRENT—to accelerate the unstructured quadrilateral grid finite-differencing estuarine and coastal ocean model (UFDECOM-i) within a unified codebase for both multicore CPUs and GPUs was investigated. Using the NVFORTRAN compiler, three versions were implemented: MC-UFDECOM-i on multicore CPUs, GPU-UFDECOM-i using automatic data migration, and GPUA-UFDECOM-i using lightweight OpenACC directives for explicit data management. The results show that DO CONCURRENT enables scalable shared-memory parallelism on CPUs, with speedups of up to 16.32 ×, and provides functional portability to GPUs without code modification. However, optimal GPU performance requires explicit data management, with GPUA-UFDECOM-i reaching a maximum speedup of 21.48 × . These results demonstrate that DO CONCURRENT ensures portability and maintainability, whereas explicit data control remains essential for high GPU efficiency.
KW - Computational efficiency
KW - Fortran DO CONCURRENT
KW - GPU parallel computing
KW - High portability
KW - Saltwater intrusion model
UR - https://www.scopus.com/pages/publications/105029644986
U2 - 10.1016/j.envsoft.2026.106911
DO - 10.1016/j.envsoft.2026.106911
M3 - 文章
AN - SCOPUS:105029644986
SN - 1364-8152
VL - 198
JO - Environmental Modelling and Software
JF - Environmental Modelling and Software
M1 - 106911
ER -