Continuous optimization
- 1 January 2005
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
This paper presents a hardware-based dynamic optimizer that continuously optimizes an application's instruction stream. In continuous optimization, dataflow optimizations are performed using simple, table-based hardware placed in the rename stage of the processor pipeline. The continuous optimizer reduces dataflow height by performing constant propagation, reassociation, redundant load elimination, store forwarding, and silent store removal. To enhance the impact of the optimizations, the optimizer integrates values generated by the execution units back into the optimization process. Continuous optimization allows instructions with input values known at optimization time to be executed in the optimizer, leaving less work for the out-of-order portion of the pipeline. Continuous optimization can detect branch mispredictions earlier and thus reduce the misprediction penalty. In this paper, we present a detailed description of a hardware optimizer and evaluate it in the context of a contemporary microarchitecture running current workloads. Our analysis of SPECint, SPECfp, and mediabench workloads reveals that a hardware optimizer can directly execute 33% of instructions, resolve 29% of mispredicted branches, and generate addresses for 76% of memory operations. These positive effects combine to provide speed ups in the range 0.99 to 1.27.Keywords
This publication has 18 references indexed in Scilit:
- Physical register inliningPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2004
- Optimum power/performance pipeline depthPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2004
- Continuous program optimizationACM Transactions on Programming Languages and Systems, 2003
- Cherry: Checkpointed early resource recycling in out-of-order microprocessorsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Putting the fill unit to work: dynamic optimizations for trace cache microprocessorsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- A novel renaming scheme to exploit value temporal locality through physical register reuse and unificationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Implementing optimizations at decode timeACM SIGARCH Computer Architecture News, 2002
- An architectural framework for runtime optimizationIEEE Transactions on Computers, 2001
- Instruction pre-processing in trace processorsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1999
- The 16-fold way: a microparallel taxonomyPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1993