Speculative precomputation: long-range prefetching of delinquent loads

13 November 2002

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 14-25
https://doi.org/10.1109/isca.2001.937427

Abstract

This paper explores Speculative Precomputation, a technique that uses idle thread contexts in a multithreaded architecture to improve performance of single-threaded applications. It attacks program stalls from data cache misses by pre-computing future memory accesses in available thread contexts, and prefetching these data. This technique is evaluated by simulating the performance of a research processor based on the Itanium/sup TM/ ISA supporting Simultaneous Multithreading. Two primary forms of Speculative Precomputation are evaluated. If only the non-speculative thread spawns speculative threads, performance gains of up to 30% are achieved when assuming ideal hardware. However, this speedup drops considerably with more realistic hardware assumptions. Permitting speculative threads to directly spawn additional speculative threads reduces the overhead associated with spawning threads and enables significantly more aggressive speculation, overcoming this limitation. Even with realistic costs for spawning threads, speedups as high as 169% are achieved, with an average speedup of 76%.

Keywords

This publication has 13 references indexed in Scilit:

Simultaneous subordinate microthreading (SSMT)
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Threaded multiple path execution
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Speculative data-driven multithreading
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Slipstream processors
Published by Association for Computing Machinery (ACM) ,2000
The Intel IA-64 compiler code generator
IEEE Micro, 2000
Itanium processor microarchitecture
IEEE Micro, 2000
Introducing the IA-64 architecture
IEEE Micro, 2000
Exploiting choice
Published by Association for Computing Machinery (ACM) ,1996
A comparative analysis of schemes for correlated branch prediction
Published by Association for Computing Machinery (ACM) ,1995
Implementing stack simulation for highly-associative memories
ACM SIGMETRICS Performance Evaluation Review, 1991

Cited by 100 articles