TY - JOUR
T1 - Minimizing average schedule length under memory constraints by optimal partitioning and prefetching
AU - Wang, Zhong
AU - O'Neil, Timothy W.
AU - Sha, Edwin H.M.
PY - 2001/3
Y1 - 2001/3
N2 - Over the last 20 years, the performance gap between CPU and memory has been steadily increasing. As a result, a variety of techniques has been devised to hide that performance gap, from intermediate fast memories (caches) to various prefetching and memory management techniques for manipulating the data present in these caches. In this paper we propose a new memory management technique that takes advantage of access pattern information that is available at compile time by prefetching certain data elements before explicitly being requested by the CPU, as well as maintaining certain data in the local memory over a number of iterations. In order to better take advantage of the locality of reference present in loop structures, our technique also uses a new approach to memory by partitioning it and reducing execution to each partition, so that information is reused at much smaller time intervals than if execution followed the usual pattern. These combined approaches - using a new set of memory instructions as well as partitioning the memory - lead to improvements in total execution time of approximately 25% over existing methods.
AB - Over the last 20 years, the performance gap between CPU and memory has been steadily increasing. As a result, a variety of techniques has been devised to hide that performance gap, from intermediate fast memories (caches) to various prefetching and memory management techniques for manipulating the data present in these caches. In this paper we propose a new memory management technique that takes advantage of access pattern information that is available at compile time by prefetching certain data elements before explicitly being requested by the CPU, as well as maintaining certain data in the local memory over a number of iterations. In order to better take advantage of the locality of reference present in loop structures, our technique also uses a new approach to memory by partitioning it and reducing execution to each partition, so that information is reused at much smaller time intervals than if execution followed the usual pattern. These combined approaches - using a new set of memory instructions as well as partitioning the memory - lead to improvements in total execution time of approximately 25% over existing methods.
UR - https://www.scopus.com/pages/publications/0035280424
U2 - 10.1023/A:1008114531225
DO - 10.1023/A:1008114531225
M3 - 文章
AN - SCOPUS:0035280424
SN - 0922-5773
VL - 27
SP - 215
EP - 233
JO - Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology
JF - Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology
IS - 3
ER -