Early Address Prediction

Open Access

8 June 2021

journal article
research article
Published by Association for Computing Machinery (ACM) in ACM Transactions on Architecture and Code Optimization

Vol. 18 (3), 1-22
https://doi.org/10.1145/3458883

Abstract

Achieving low load-to-use latency with low energy and storage overheads is critical for performance. Existing techniques either prefetch into the pipeline (via address prediction and validation) or provide data reuse in the pipeline (via register sharing or L0 caches). These techniques provide a range of tradeoffs between latency, reuse, and overhead. In this work, we present a pipeline prefetching technique that achieves state-of-the-art performance and data reuse without additional data storage, data movement, or validation overheads by adding address tags to the register file. Our addition of register file tags allows us to forward (reuse) load data from the register file with no additional data movement, keep the data alive in the register file beyond the instruction’s lifetime to increase temporal reuse, and coalesce prefetch requests to achieve spatial reuse. Further, we show that we can use the existing memory order violation detection hardware to validate prefetches and data forwards without additional overhead. Our design achieves the performance of existing pipeline prefetching while also forwarding 32% of the loads from the register file (compared to 15% in state-of-the-art register sharing), delivering a 16% reduction in L1 dynamic energy (1.6% total processor energy), with an area overhead of less than 0.5%.

Keywords

This publication has 28 references indexed in Scilit:

Cost-effective speculative scheduling in high performance processors
Published by Association for Computing Machinery (ACM) ,2015
EOLE
ACM SIGARCH Computer Architecture News, 2014
The gem5 simulator
ACM SIGARCH Computer Architecture News, 2011
Reducing leakage in power-saving capable caches for embedded systems by using a filter cache
Published by Association for Computing Machinery (ACM) ,2007
Three extensions to register integration
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Load and store reuse using register file contents
Published by Association for Computing Machinery (ACM) ,2001
Using dynamic cache management techniques to reduce energy in a high-performance processor
Published by Association for Computing Machinery (ACM) ,1999
The Alpha 21264 microprocessor
IEEE Micro, 1999
Speculative execution via address prediction and data prefetching
Published by Association for Computing Machinery (ACM) ,1997
ARB: a hardware mechanism for dynamic reordering of memory references
International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1996

Cited by 1 article