APT-GET
- 28 March 2022
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM)
Abstract
Prefetching which predicts future memory accesses and preloads them from main memory, is a widely-adopted technique to overcome the processor-memory performance gap. Unfortunately, hardware prefetchers implemented in today's processors cannot identify complex and irregular memory access patterns exhibited by modern data-driven applications and hence developers need to rely on software prefetching techniques. We investigate the challenges of enabling effective, automated software data prefetching. Our investigation reveals that the state-of-the-art compiler-based prefetching mechanism falls short in achieving high performance due to its static nature. Based on this insight, we design APT-GET, a novel profile-guided technique that ensures prefetch timeliness by leveraging dynamic execution time information. APT-GET leverages efficient hardware support such as Intel's Last Branch Record (LBR), for collecting application execution profiles with negligible overhead to characterize the execution time of loads. APT-GET then introduces a novel analytical model to find the optimal prefetch-distance and prefetch injection site based on the collected profile to enable timely prefetches. We study APT-GET in the context of 10 real-world applications and demonstrate that it achieves a speedup of up to 1.98× and of 1.30× on average. By ensuring prefetch timeliness, APT-GET improves the performance by 1.25× over the state-of-the-art software data prefetching mechanism.Keywords
Funding Information
- Intel Corporation
This publication has 93 references indexed in Scilit:
- When Prefetching Works, When It Doesn’t, and WhyACM Transactions on Architecture and Code Optimization, 2012
- Spatial Memory StreamingACM SIGARCH Computer Architecture News, 2006
- Bridging the Processor-Memory Performance Gapwith 3D IC TechnologyIEEE Design & Test of Computers, 2005
- Stride prefetching by dynamically inspecting objectsACM SIGPLAN Notices, 2003
- A decoupled predictor-directed stream prefetching architectureInternational Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2003
- A stateless, content-directed data prefetching mechanismACM SIGPLAN Notices, 2002
- Improving index performance through prefetchingACM SIGMOD Record, 2001
- Dependence based prefetching for linked data structuresACM SIGOPS Operating Systems Review, 1998
- Compiler optimizations for improving data localityACM SIGPLAN Notices, 1994
- Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffersACM SIGARCH Computer Architecture News, 1990