Bypass and insertion algorithms for exclusive last-level caches

4 June 2011

journal article
conference paper
Published by Association for Computing Machinery (ACM) in ACM SIGARCH Computer Architecture News

Vol. 39 (3), 81-92
https://doi.org/10.1145/2024723.2000075

Abstract

Inclusive last-level caches (LLCs) waste precious silicon estate due to cross-level replication of cache blocks. As the industry moves toward cache hierarchies with larger inner levels, this wasted cache space leads to bigger performance losses compared to exclusive LLCs. However, exclusive LLCs make the design of replacement policies more challenging. While in an inclusive LLC a block can gather a filtered access history, this is not possible in an exclusive design because the block is de-allocated from the LLC on a hit. As a result, the popular least-recently-used replacement policy and its approximations are rendered ineffective and proper choice of insertion ages of cache blocks becomes even more important in exclusive designs. On the other hand, it is not necessary to fill every block into an exclusive LLC. This is known as selective cache bypassing and is not possible to implement in an inclusive LLC because that would violate inclusion. This paper explores insertion and bypass algorithms for exclusive LLCs. Our detailed execution-driven simulation results show that a combination of our best insertion and bypass policies delivers an improvement of up to 61.2% and on average (geometric mean) 3.4% in terms of instructions retired per cycle (IPC) for 97 single-threaded dynamic instruction traces spanning selected SPEC 2006 and server applications, running on a 2 MB 16-way exclusive LLC compared to a baseline exclusive design in the presence of well-tuned multi-stream hardware prefetchers. The corresponding improvements in throughput for 35 4-way multi-programmed workloads running with an 8 MB 16-way shared exclusive LLC are 20.6% (maximum) and 2.5% (geometric mean).

Keywords

This publication has 15 references indexed in Scilit:

Insertion policy selection using Decision Tree Analysis
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2010
High performance cache replacement using re-reference interval prediction (RRIP)
Published by Association for Computing Machinery (ACM) ,2010
Adaptive insertion policies for managing shared caches
Published by Association for Computing Machinery (ACM) ,2008
Counter-Based Cache Replacement and Bypassing Algorithms
International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2008
Adaptive insertion policies for high performance caching
Published by Association for Computing Machinery (ACM) ,2007
Dead-block prediction & dead-block correlating prefetchers
Published by Association for Computing Machinery (ACM) ,2001
A data cache with multiple caching strategies tuned to different types of locality
Published by Association for Computing Machinery (ACM) ,1995
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
Published by Association for Computing Machinery (ACM) ,1990
Evaluation techniques for storage hierarchies
IBM Systems Journal, 1970
A study of replacement algorithms for a virtual-storage computer
IBM Systems Journal, 1966

Cited by 11 articles