Architectural and compiler support for effective instruction prefetching
- 1 February 2001
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Computer Systems
- Vol. 19 (1), 71-109
- https://doi.org/10.1145/367742.367786
Abstract
Instruction cache miss latency is becoming an increasingly important performance bottleneck, especially for commercial applications. Although instruction prefetching is an attractive technique for tolerating this latency, we find that existing prefetching schemes are insufficient for modern superscalar processors, since they fail to issue prefetches early enough (particularly for nonsequential accesses). To overcome these limitations, we propose a new instruction prefetching technique whereby the hardware and software cooperate to hide the latency as follows. The hardware performs aggressive sequential prefetching combined with a novel prefetch filtering mechanism to allow it to get far ahead without polluting the cache. To hide the latency of nonsequential accesses, we propose and implement a novel compiler algorithm which automatically inserts instruction-prefetch the targets of control transfers far enough in advance. Our experimental results demonstrate that this new approach hides 50% or more tof the latecy remaining with the best previous techniques, while at the same time reduces the number of useless prefetches by a factor of six. We find that both the prefetch filtering and compiler-inserted prefetching components of our design are essential and complementary, and that the compiler can limit the code expansion to only 9% on average. In addition, we show that the performance of our technique can be further increased by using profiling information to help reduce cache conflicts and unnecessary prefetches. From an architectural perspective, these performance advantages are sustained over a range of common miss latencies and bandwidth. Finally, our technique is cost effective as well, since it delivers performance comparable to (or even better than) that of larger caches, but requires a much smaller hardware budget.Keywords
This publication has 17 references indexed in Scilit:
- A scalable front-end architecture for fast instruction deliveryACM SIGARCH Computer Architecture News, 1999
- A comparison of software code reordering and victim buffersACM SIGARCH Computer Architecture News, 1999
- Prefetching using Markov predictorsPublished by Association for Computing Machinery (ACM) ,1997
- Efficient procedure mapping using cache line coloringPublished by Association for Computing Machinery (ACM) ,1997
- Data prefetching on the HP PA-8000ACM SIGARCH Computer Architecture News, 1997
- Instruction prefetching of systems codes with layout optimized for reduced cache missesACM SIGARCH Computer Architecture News, 1996
- The Mips R10000 superscalar microprocessorIEEE Micro, 1996
- Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffersPublished by Association for Computing Machinery (ACM) ,1990
- Profile guided code positioningPublished by Association for Computing Machinery (ACM) ,1990
- Cache MemoriesACM Computing Surveys, 1982