Continuous optimization

1 January 2005

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 86-97
https://doi.org/10.1109/isca.2005.19

Abstract

This paper presents a hardware-based dynamic optimizer that continuously optimizes an application's instruction stream. In continuous optimization, dataflow optimizations are performed using simple, table-based hardware placed in the rename stage of the processor pipeline. The continuous optimizer reduces dataflow height by performing constant propagation, reassociation, redundant load elimination, store forwarding, and silent store removal. To enhance the impact of the optimizations, the optimizer integrates values generated by the execution units back into the optimization process. Continuous optimization allows instructions with input values known at optimization time to be executed in the optimizer, leaving less work for the out-of-order portion of the pipeline. Continuous optimization can detect branch mispredictions earlier and thus reduce the misprediction penalty. In this paper, we present a detailed description of a hardware optimizer and evaluate it in the context of a contemporary microarchitecture running current workloads. Our analysis of SPECint, SPECfp, and mediabench workloads reveals that a hardware optimizer can directly execute 33% of instructions, resolve 29% of mispredicted branches, and generate addresses for 76% of memory operations. These positive effects combine to provide speed ups in the range 0.99 to 1.27.

Keywords

This publication has 18 references indexed in Scilit:

Physical register inlining
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2004
Optimum power/performance pipeline depth
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2004
Continuous program optimization
ACM Transactions on Programming Languages and Systems, 2003
Cherry: Checkpointed early resource recycling in out-of-order microprocessors
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Putting the fill unit to work: dynamic optimizations for trace cache microprocessors
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
A novel renaming scheme to exploit value temporal locality through physical register reuse and unification
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Implementing optimizations at decode time
ACM SIGARCH Computer Architecture News, 2002
An architectural framework for runtime optimization
IEEE Transactions on Computers, 2001
Instruction pre-processing in trace processors
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1999
The 16-fold way: a microparallel taxonomy
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1993

Cited by 16 articles