Filtered runahead execution with a runahead buffer

5 December 2015

conference paper
conference paper
Published by Association for Computing Machinery (ACM)

p. 358-369
https://doi.org/10.1145/2830772.2830812

Abstract

Runahead execution dynamically expands the instruction window of an out of order processor to generate memory level parallelism (MLP) while the core would otherwise be stalled. Unfortunately, runahead has the disadvantage of requiring the front-end to remain active to supply instructions. We propose a new structure (the Runahead Buffer) for supplying these instructions. We note that cache misses are often caused by repetitive, short dependence chains. We store these dependence chains in the runahead buffer. During runahead, the runahead buffer is used to supply instructions. This generates 2x more MLP than traditional runahead on average because the core can run further ahead. It also saves energy since the front-end can be clock-gated, reducing dynamic energy consumption. Over a no-prefetching/prefetching baseline, the result is a performance benefit of 17.2%/7.8% and an energy reduction of 6.7%/4.5% respectively. Traditional runahead with additional energy optimizations results in a performance benefit of 12.1%/5.9% but an energy increase of 9.5%/5.4%. Finally, we propose a hybrid policy that switches between the runahead buffer and traditional runahead, maximizing performance.

Keywords

This publication has 33 references indexed in Scilit:

Multi2Sim
Published by Association for Computing Machinery (ACM) ,2012
Understanding the effects of wrong-path memory references on processor performance
Published by Association for Computing Machinery (ACM) ,2004
Runahead execution: an effective alternative to large instruction windows
IEEE Micro, 2003
Automatically characterizing large scale program behavior
Published by Association for Computing Machinery (ACM) ,2002
The memory gap and the future of high performance memories
ACM SIGARCH Computer Architecture News, 2001
Slipstream processors
Published by Association for Computing Machinery (ACM) ,2000
Simultaneous subordinate microthreading (SSMT)
ACM SIGARCH Computer Architecture News, 1999
Improving data cache performance by pre-executing instructions under a cache miss
Published by Association for Computing Machinery (ACM) ,1997
Hitting the memory wall
ACM SIGARCH Computer Architecture News, 1995
The microarchitecture of superscalar processors
Proceedings of the IEEE, 1995

Cited by 18 articles