Cooperative Caching for Chip Multiprocessors

1 May 2006

journal article
Published by Association for Computing Machinery (ACM) in ACM SIGARCH Computer Architecture News

Vol. 34 (2), 264-276
https://doi.org/10.1145/1150019.1136509

Abstract

This paper presents CMP Cooperative Caching, a unified framework to manage a CMP's aggregate on-chip cache resources. Cooperative caching combines the strengths of private and shared cache organizations by forming an aggregate "shared" cache through cooperation among private caches. Locally active data are attracted to the private caches by their accessing processors to reduce remote on-chip references, while globally active data are cooperatively identified and kept in the aggregate cache to reduce off-chip accesses. Examples of cooperation include cache-to-cache transfers of clean data, replication-aware data replacement, and global replacement of inactive data. These policies can be implemented by modifying an existing cache replacement policy and cache coherence protocol, or by the new implementation of a directory-based protocol presented in this paper. Our evaluation using full-system simulation shows that cooperative caching achieves an off-chip miss rate similar to that of a shared cache, and a local cache hit rate similar to that of using private caches. Cooperative caching performs robustly over a range of system/cache sizes and memory latencies. For an 8-core CMP with 1MB L2 cache per core, the best cooperative caching scheme improves the performance of multithreaded commercial workloads by 5-11% compared with a shared cache and 4-38% compared with private caches. For a 4-core CMP running multiprogrammed SPEC2000 workloads, cooperative caching is on average 11% and 6% faster than shared and private cache organizations, respectively.

Keywords

This publication has 24 references indexed in Scilit:

Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News, 2005
A NUCA substrate for flexible CMP cache sharing
Published by Association for Computing Machinery (ACM) ,2005
Niagara: A 32-Way Multithreaded Sparc Processor
IEEE Micro, 2005
Simulating a $2M commercial server on a $2K PC
Computer, 2003
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
Published by Association for Computing Machinery (ACM) ,2002
Simics: A full system simulation platform
Computer, 2002
Summary cache: a scalable wide-area Web cache sharing protocol
IEEE/ACM Transactions on Networking, 2000
Piranha
Published by Association for Computing Machinery (ACM) ,2000
Implementing global memory management in a workstation cluster
Published by Association for Computing Machinery (ACM) ,1995
DDM-a cache-only memory architecture
Computer, 1992

Cited by 94 articles