CRISP: critical slice prefetching

conference paper
conference paper
Published by Association for Computing Machinery (ACM) in Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

https://doi.org/10.1145/3503222.3507745

Abstract

The high access latency of DRAM continues to be a performance challenge for contemporary microprocessor systems. Prefetching is a well-established technique to address this problem, however, existing implemented designs fail to provide any performance benefits in the presence of irregular memory access patterns. The hardware complexity of prior techniques that can predict irregular memory accesses such as runahead execution has proven untenable for implementation in real hardware. We propose a lightweight mechanism to hide the high latency of irregular memory access patterns by leveraging criticality-based scheduling. In particular, our technique executes delinquent loads and their load slices as early as possible, hiding a significant fraction of their latency. Furthermore, we observe that the latency induced by branch mispredictions and other high latency instructions can be hidden with a similar approach. Our proposal only requires minimal hardware modifications by performing memory access classification, load and branch slice extraction, as well as priority analysis exclusively in software. As a result, our technique is feasible to implement, introducing only a simple new instruction prefix while requiring minimal modifications of the instruction scheduler. Our technique increases the IPC of memory-latency-bound applications by up to 38% and by 8.4% on average.

Keywords

This publication has 85 references indexed in Scilit:

Techniques for Efficient Processing in Runahead Execution Engines
ACM SIGARCH Computer Architecture News, 2005
Helper threads via virtual multithreading on an experimental itanium ^® 2 processor-based platform
ACM SIGPLAN Notices, 2004
A decoupled predictor-directed stream prefetching architecture
International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2003
A large, fast instruction window for tolerating cache misses
ACM SIGARCH Computer Architecture News, 2002
Understanding the backward slices of performance degrading instructions
ACM SIGARCH Computer Architecture News, 2000
Issue logic for a 600-MHz out-of-order execution microprocessor
IEEE Journal of Solid-State Circuits, 1998
Prefetching using Markov predictors
ACM SIGARCH Computer Architecture News, 1997
An investigation of the performance of various dynamic scheduling techniques
ACM SIGMICRO Newsletter, 1992
Alternative implementations of two-level adaptive branch prediction
ACM SIGARCH Computer Architecture News, 1992
Scheduling expressions on a pipelined processor with a maximal delay of one cycle
ACM Transactions on Programming Languages and Systems, 1989

Cited by 9 articles