Filtered runahead execution with a runahead buffer

Abstract
Runahead execution dynamically expands the instruction window of an out of order processor to generate memory level parallelism (MLP) while the core would otherwise be stalled. Unfortunately, runahead has the disadvantage of requiring the front-end to remain active to supply instructions. We propose a new structure (the Runahead Buffer) for supplying these instructions. We note that cache misses are often caused by repetitive, short dependence chains. We store these dependence chains in the runahead buffer. During runahead, the runahead buffer is used to supply instructions. This generates 2x more MLP than traditional runahead on average because the core can run further ahead. It also saves energy since the front-end can be clock-gated, reducing dynamic energy consumption. Over a no-prefetching/prefetching baseline, the result is a performance benefit of 17.2%/7.8% and an energy reduction of 6.7%/4.5% respectively. Traditional runahead with additional energy optimizations results in a performance benefit of 12.1%/5.9% but an energy increase of 9.5%/5.4%. Finally, we propose a hybrid policy that switches between the runahead buffer and traditional runahead, maximizing performance.

This publication has 33 references indexed in Scilit: