CRISP: critical slice prefetching
- 22 February 2022
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM) in Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems
Abstract
The high access latency of DRAM continues to be a performance challenge for contemporary microprocessor systems. Prefetching is a well-established technique to address this problem, however, existing implemented designs fail to provide any performance benefits in the presence of irregular memory access patterns. The hardware complexity of prior techniques that can predict irregular memory accesses such as runahead execution has proven untenable for implementation in real hardware. We propose a lightweight mechanism to hide the high latency of irregular memory access patterns by leveraging criticality-based scheduling. In particular, our technique executes delinquent loads and their load slices as early as possible, hiding a significant fraction of their latency. Furthermore, we observe that the latency induced by branch mispredictions and other high latency instructions can be hidden with a similar approach. Our proposal only requires minimal hardware modifications by performing memory access classification, load and branch slice extraction, as well as priority analysis exclusively in software. As a result, our technique is feasible to implement, introducing only a simple new instruction prefix while requiring minimal modifications of the instruction scheduler. Our technique increases the IPC of memory-latency-bound applications by up to 38% and by 8.4% on average.Keywords
This publication has 85 references indexed in Scilit:
- Techniques for Efficient Processing in Runahead Execution EnginesACM SIGARCH Computer Architecture News, 2005
- Helper threads via virtual multithreading on an experimental itanium ® 2 processor-based platformACM SIGPLAN Notices, 2004
- A decoupled predictor-directed stream prefetching architectureInternational Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2003
- A large, fast instruction window for tolerating cache missesACM SIGARCH Computer Architecture News, 2002
- Understanding the backward slices of performance degrading instructionsACM SIGARCH Computer Architecture News, 2000
- Issue logic for a 600-MHz out-of-order execution microprocessorIEEE Journal of Solid-State Circuits, 1998
- Prefetching using Markov predictorsACM SIGARCH Computer Architecture News, 1997
- An investigation of the performance of various dynamic scheduling techniquesACM SIGMICRO Newsletter, 1992
- Alternative implementations of two-level adaptive branch predictionACM SIGARCH Computer Architecture News, 1992
- Scheduling expressions on a pipelined processor with a maximal delay of one cycleACM Transactions on Programming Languages and Systems, 1989