TY - GEN
T1 - Polynomial-time nested loop fusion with full parallelism
AU - Sha, E. H.M.
AU - Lang, Chenhua
AU - Passos, N. L.
N1 - Publisher Copyright:
© 1996 IEEE.
PY - 1996
Y1 - 1996
N2 - Data locality and synchronization overhead are two important factors that affect the performance of applications on multiprocessors. Loop fusion is an effective way of reducing synchronization and improving data locality. Traditional fusion techniques, however either cannot address the case when fusion-preventing dependence exists in nested loops, or cannot achieve good parallelism after fusion. This paper gives a significant improvement by presenting several efficient polynomial-time algorithms to solve these problems. These algorithms combined with the retiming technique allow nested loop fusion in the existence of outmost loop-carried dependence as in the presence of fusion-preventing dependence. Furthermore, the technique is proved to achieve fully parallel execution of the fused loops.
AB - Data locality and synchronization overhead are two important factors that affect the performance of applications on multiprocessors. Loop fusion is an effective way of reducing synchronization and improving data locality. Traditional fusion techniques, however either cannot address the case when fusion-preventing dependence exists in nested loops, or cannot achieve good parallelism after fusion. This paper gives a significant improvement by presenting several efficient polynomial-time algorithms to solve these problems. These algorithms combined with the retiming technique allow nested loop fusion in the existence of outmost loop-carried dependence as in the presence of fusion-preventing dependence. Furthermore, the technique is proved to achieve fully parallel execution of the fused loops.
UR - https://www.scopus.com/pages/publications/72049084826
U2 - 10.1109/ICPP.1996.538554
DO - 10.1109/ICPP.1996.538554
M3 - 会议稿件
AN - SCOPUS:72049084826
T3 - Proceedings of the International Conference on Parallel Processing
SP - 9
EP - 16
BT - Software
A2 - Pingali, K.
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 25th International Conference on Parallel Processing, ICPP 1996
Y2 - 12 August 1996 through 16 August 1996
ER -