Abstract
The large latency of memory accesses in modern computers is a key obstacle in achieving high processor utilization. As a result, a variety of techniques have been devised to hide this latency. These techniques range from cache hierarchies to various prefetching and memory management techniques for manipulating the data present in the caches. In DSP applications, the existence of large numbers of uniform nested loops makes the issue of loop scheduling very important. In this paper, we propose a new memory management technique that can be applied to computer architectures with three levels of memory, which is the scheme generally adopted in contemporary computer architecture. This technique takes advantage of access pattern information that is available at compile time by prefetching certain data elements from the higher level memory before they are explicitly requested by the lower level memory or CPU. It also maintains certain data for a period of time to prevent unnecessary data swapping. In order to take better advantage of the locality of references present in these loop structures, our technique introduces a new approach to memory management by partitioning it and reducing execution to each partition so that data locality is much improved compared with the usual pattern. These combined approaches-using a new set of memory instructions as well as partitioning the memory-lead to improvements in average execution times of approximately 35% over the one-level partition algorithm and more than 80% over list scheduling and hardware prefetching.
| Original language | English |
|---|---|
| Pages (from-to) | 2853-2864 |
| Number of pages | 12 |
| Journal | IEEE Transactions on Signal Processing |
| Volume | 49 |
| Issue number | 11 |
| DOIs | |
| State | Published - Nov 2001 |
| Externally published | Yes |
Keywords
- Latency hiding
- Memory hierarchy
- Partitioning
- Prefetching
- Scheduling