TY - GEN
T1 - Optimal Loop Tiling for Minimizing Write Operations on NVMs with Complete Memory Latency Hiding
AU - Xu, Rui
AU - Sha, Edwin Hsing Mean
AU - Zhuge, Qingfeng
AU - Song, Yuhong
AU - Lin, Jingzhi
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Non-volatile memory (NVM) is expected to be the second level memory (named remote memory) in two-level memory hierarchy in the future. However, NVM has the limited write endurance, thus it is vital to reduce the number of write operations on NVM. Meanwhile, in two-level memory hierarchy, prefetch is widely used for fetching certain data before it is actually required, to hide the remote memory access latency. In general, large-scale nested loop is the performance bottleneck in one program due to the write operations on NVM caused by the first level memory (named local memory) miss and data reuse. Loop tiling is the key technique for grouping iterations so as to reduce the communication with remote memory used in compiler. In this paper, we propose a new loop tiling approach for minimizing the write operations on NVMs and completely hiding the NVM access latency. Specifically, we introduce a series of theorems to help loop tiling. Then, a legal tile shape and an optimal tile size selection strategy is proposed according to data dependency and local memory capacity. Furthermore, we propose a pipeline scheduling policy to completely hide the remote memory latency. Extensive experiments show that the proposed techniques can reduce write operations on NVMs by 95.1% on average, and NVM latency can be completely hidden.
AB - Non-volatile memory (NVM) is expected to be the second level memory (named remote memory) in two-level memory hierarchy in the future. However, NVM has the limited write endurance, thus it is vital to reduce the number of write operations on NVM. Meanwhile, in two-level memory hierarchy, prefetch is widely used for fetching certain data before it is actually required, to hide the remote memory access latency. In general, large-scale nested loop is the performance bottleneck in one program due to the write operations on NVM caused by the first level memory (named local memory) miss and data reuse. Loop tiling is the key technique for grouping iterations so as to reduce the communication with remote memory used in compiler. In this paper, we propose a new loop tiling approach for minimizing the write operations on NVMs and completely hiding the NVM access latency. Specifically, we introduce a series of theorems to help loop tiling. Then, a legal tile shape and an optimal tile size selection strategy is proposed according to data dependency and local memory capacity. Furthermore, we propose a pipeline scheduling policy to completely hide the remote memory latency. Extensive experiments show that the proposed techniques can reduce write operations on NVMs by 95.1% on average, and NVM latency can be completely hidden.
KW - Loop tiling
KW - Memory latency hiding
KW - Non-volatile Memory
KW - Pipeline
KW - Write operations
UR - https://www.scopus.com/pages/publications/85126139382
U2 - 10.1109/ASP-DAC52403.2022.9712532
DO - 10.1109/ASP-DAC52403.2022.9712532
M3 - 会议稿件
AN - SCOPUS:85126139382
T3 - Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC
SP - 389
EP - 394
BT - ASP-DAC 2022 - 27th Asia and South Pacific Design Automation Conference, Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 27th Asia and South Pacific Design Automation Conference, ASP-DAC 2022
Y2 - 17 January 2022 through 20 January 2022
ER -