Filtered runahead execution with a runahead buffer
- 5 December 2015
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM)
- p. 358-369
- https://doi.org/10.1145/2830772.2830812
Abstract
Runahead execution dynamically expands the instruction window of an out of order processor to generate memory level parallelism (MLP) while the core would otherwise be stalled. Unfortunately, runahead has the disadvantage of requiring the front-end to remain active to supply instructions. We propose a new structure (the Runahead Buffer) for supplying these instructions. We note that cache misses are often caused by repetitive, short dependence chains. We store these dependence chains in the runahead buffer. During runahead, the runahead buffer is used to supply instructions. This generates 2x more MLP than traditional runahead on average because the core can run further ahead. It also saves energy since the front-end can be clock-gated, reducing dynamic energy consumption. Over a no-prefetching/prefetching baseline, the result is a performance benefit of 17.2%/7.8% and an energy reduction of 6.7%/4.5% respectively. Traditional runahead with additional energy optimizations results in a performance benefit of 12.1%/5.9% but an energy increase of 9.5%/5.4%. Finally, we propose a hybrid policy that switches between the runahead buffer and traditional runahead, maximizing performance.Keywords
This publication has 33 references indexed in Scilit:
- Multi2SimPublished by Association for Computing Machinery (ACM) ,2012
- Understanding the effects of wrong-path memory references on processor performancePublished by Association for Computing Machinery (ACM) ,2004
- Runahead execution: an effective alternative to large instruction windowsIEEE Micro, 2003
- Automatically characterizing large scale program behaviorPublished by Association for Computing Machinery (ACM) ,2002
- The memory gap and the future of high performance memoriesACM SIGARCH Computer Architecture News, 2001
- Slipstream processorsPublished by Association for Computing Machinery (ACM) ,2000
- Simultaneous subordinate microthreading (SSMT)ACM SIGARCH Computer Architecture News, 1999
- Improving data cache performance by pre-executing instructions under a cache missPublished by Association for Computing Machinery (ACM) ,1997
- Hitting the memory wallACM SIGARCH Computer Architecture News, 1995
- The microarchitecture of superscalar processorsProceedings of the IEEE, 1995