Using a user-level memory thread for correlation prefetching

1 May 2002

journal article
Published by Association for Computing Machinery (ACM) in ACM SIGARCH Computer Architecture News

Vol. 30 (2), 171-182
https://doi.org/10.1145/545214.545235

Abstract

This paper introduces the idea of using a User-Level Memory Thread (ULMT) for correlation prefetching. In this approach, a user thread runs on a general-purpose processor in main memory, either in the memory controller chip or in a DRAM chip. The thread performs correlation prefetching in software, sending the prefetched data into the L2 cache of the main processor. This approach requires minimal hardware beyond the memory processor: the correlation table is a software data structure that resides in main memory, while the main processor only needs a few modifications to its L2 cache so that it can accept incoming prefetches. In addition, the approach has wide usability, as it can effectively prefetch even for irregular applications. Finally, it is very flexible, as the prefetching algorithm can be customized by the user on an application basis. Our simulation results show that, through a new design of the correlation table and prefetching algorithm, our scheme delivers good results. Specifically, nine mostly-irregular applications show an average speedup of 1.32. Furthermore, our scheme works well in combination with a conventional processor-side sequential prefetcher, in which case the average speedup increases to 1.46. Finally, by exploiting the customization of the prefetching algorithm, we increase the average speedup to 1.53.

Keywords

This publication has 12 references indexed in Scilit:

Dead-block prediction & dead-block correlating prefetchers
Published by Association for Computing Machinery (ACM) ,2001
Predictor-directed stream buffers
Published by Association for Computing Machinery (ACM) ,2000
Push vs. pull
Published by Association for Computing Machinery (ACM) ,2000
Comparing data forwarding and prefetching for communication-induced misses in shared-memory MPs
Published by Association for Computing Machinery (ACM) ,1998
Prefetching using Markov predictors
Published by Association for Computing Machinery (ACM) ,1997
Scalable processors in the billion-transistor era: IRAM
Computer, 1997
Compiler-based prefetching for recursive data structures
Published by Association for Computing Machinery (ACM) ,1996
Reducing memory latency via non-blocking and prefetching caches
Published by Association for Computing Machinery (ACM) ,1992
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
Published by Association for Computing Machinery (ACM) ,1990
Dynamic Improvement of Locality in Virtual Memory Systems
IEEE Transactions on Software Engineering, 1976

Cited by 4 articles