Criticality Driven Fetch

Abstract

Modern OoO cores achieve high levels of performance using large instruction windows. Scaling the window size improves performance by making visible more of the parallelism present in programs. However, this leads to an exponential increase in area and power. We specify Criticality Driven Fetch (CDF), a new execution paradigm that preferentially fetches, allocates, and executes instructions on the critical path of the program. By skipping over non-critical instructions, critical instructions in the ROB can span a sequential instruction window larger than the size of the ROB. This increases the amount of parallelism that can be extracted from critical instructions, thereby improving performance. In our implementation, CDF improves performance by (a) increasing the MLP for independent loads executing concurrently, (b) fetching critical path loads past hard-to-predict branches (by resolving them earlier), and (c) by initiating last level cache misses that cannot be parallelized earlier. Accelerating critical loads using CDF achieves a 6.1% IPC improvement over a baseline OoO core with prefetching. Compared to Precise Runahead, the prior state of the art work on accelerating last level cache misses on the core, we provide better performance and reduce memory traffic and energy consumption by 4.0% and 7.2% respectively.

Keywords

Funding Information

NSF (National Science Foundation) (2011145)

This publication has 14 references indexed in Scilit:

Clairvoyance: Look-ahead compile-time scheduling
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2017
Filtered runahead execution with a runahead buffer
Published by Association for Computing Machinery (ACM) ,2015
Ramulator: A Fast and Extensible DRAM Simulator
IEEE Computer Architecture Letters, 2015
McPAT
Published by Association for Computing Machinery (ACM) ,2009
Boosting single-thread performance in multi-core systems through fine-grain multi-threading
Published by Association for Computing Machinery (ACM) ,2009
A performance-correctness explicitly-decoupled architecture
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2008
Fetch-Criticality Reduction through Control Independence
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2008
Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2007
Runahead execution: an alternative to very large instruction windows for out-of-order processors
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Speculative data-driven multithreading
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002