Criticality Driven Fetch
- 17 October 2021
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM) in MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture
Abstract
Modern OoO cores achieve high levels of performance using large instruction windows. Scaling the window size improves performance by making visible more of the parallelism present in programs. However, this leads to an exponential increase in area and power. We specify Criticality Driven Fetch (CDF), a new execution paradigm that preferentially fetches, allocates, and executes instructions on the critical path of the program. By skipping over non-critical instructions, critical instructions in the ROB can span a sequential instruction window larger than the size of the ROB. This increases the amount of parallelism that can be extracted from critical instructions, thereby improving performance. In our implementation, CDF improves performance by (a) increasing the MLP for independent loads executing concurrently, (b) fetching critical path loads past hard-to-predict branches (by resolving them earlier), and (c) by initiating last level cache misses that cannot be parallelized earlier. Accelerating critical loads using CDF achieves a 6.1% IPC improvement over a baseline OoO core with prefetching. Compared to Precise Runahead, the prior state of the art work on accelerating last level cache misses on the core, we provide better performance and reduce memory traffic and energy consumption by 4.0% and 7.2% respectively.Keywords
Funding Information
- NSF (National Science Foundation) (2011145)
This publication has 14 references indexed in Scilit:
- Clairvoyance: Look-ahead compile-time schedulingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2017
- Filtered runahead execution with a runahead bufferPublished by Association for Computing Machinery (ACM) ,2015
- Ramulator: A Fast and Extensible DRAM SimulatorIEEE Computer Architecture Letters, 2015
- McPATPublished by Association for Computing Machinery (ACM) ,2009
- Boosting single-thread performance in multi-core systems through fine-grain multi-threadingPublished by Association for Computing Machinery (ACM) ,2009
- A performance-correctness explicitly-decoupled architecturePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2008
- Fetch-Criticality Reduction through Control IndependencePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2008
- Dynamic Warp Formation and Scheduling for Efficient GPU Control FlowPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2007
- Runahead execution: an alternative to very large instruction windows for out-of-order processorsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Speculative data-driven multithreadingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002