Improving the effectiveness of software prefetching with adaptive executions
- 24 December 2002
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- No. 1089795X,p. 68-78
- https://doi.org/10.1109/pact.1996.552556
Abstract
The effectiveness of software prefetching for tolerating latency depends mainly on the ability of programmers and/or compilers to: 1) predict in advance the magnitude of the run-time remote memory latency, and 2) insert prefetches at a distance that minimizes stall time without causing cache pollution. Scalable heterogeneous multiprocessors, such as network of computers (NOWs), present special challenges to static software prefetching because on these systems the network topology and node configuration are not completely determined at compile time. Furthermore, dynamic software prefetching cannot do much better because individual nodes on heterogeneous large NOWs would tend to experience different remote memory delays over time. A fixed prefetch distance, even when computed at run-time, cannot perform well for the whole duration of a software pipeline. Here we present an adaptive scheme for software prefetching that makes it possible for nodes to dynamically change, not only the amount of prefetching, but the prefetch distance as well. Doing this makes it possible to tailor the execution of software pipeline to the prevailing conditions affecting each node. We show how simple performance data collected by hardware monitors can allow programs to observe, evaluate and change their prefetching policies. Our results show that on the benchmarks we simulated adaptive prefetching was capable of improving performance over static and dynamic prefetching by 10% to 60%. More important, future increases in the heterogeneity and size of NOWs will increase the advantages of adaptive prefetching over static and dynamic schemes.Keywords
This publication has 13 references indexed in Scilit:
- Adaptive Cache Coherency For Detecting Migratory Shared DataPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Trojan: a high-performance simulator for shared memory architecturesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- The SPLASH-2 programs: characterization and methodological considerationsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Adaptive loop transformations for scientific programsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Avoiding conflict misses dynamically in large direct-mapped cachesPublished by Association for Computing Machinery (ACM) ,1994
- PROTEUS: A High-Performance Parallel-Architecture SimulatorPublished by Defense Technical Information Center (DTIC) ,1991
- Run-time parallelization and scheduling of loopsIEEE Transactions on Computers, 1991
- Limits on interconnection network performanceIEEE Transactions on Parallel and Distributed Systems, 1991
- Synchronization, coherence, and event ordering in multiprocessorsComputer, 1988
- Compiler Algorithms for SynchronizationIEEE Transactions on Computers, 1987