Fundamental Latency Trade-off in Architecting DRAM Caches: Outperforming Impractical SRAM-Tags with a Simple and Practical Design

Top Cited Papers

1 December 2012

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 235-246
https://doi.org/10.1109/micro.2012.30

Abstract

This paper analyzes the design trade-offs in architecting large-scale DRAM caches. Prior research, including the recent work from Loh and Hill, have organized DRAM caches similar to conventional caches. In this paper, we contend that some of the basic design decisions typically made for conventional caches (such as serialization of tag and data access, large associativity, and update of replacement state) are detrimental to the performance of DRAM caches, as they exacerbate the already high hit latency. We show that higher performance can be obtained by optimizing the DRAM cache architecture first for latency, and then for hit rate. We propose a latency-optimized cache architecture, called Alloy Cache, that eliminates the delay due to tag serialization by streaming tag and data together in a single burst. We also propose a simple and highly effective Memory Access Predictor that incurs a storage overhead of 96 bytes per core and a latency of 1 cycle. It helps service cache misses faster without the need to wait for a cache miss detection in the common case. Our evaluations show that our latency-optimized cache design significantly outperforms both the recent proposal from Loh and Hill, as well as an impractical SRAM Tag-Store design that incurs an unacceptable overhead of several tens of megabytes. On average, the proposal from Loh and Hill provides 8.7% performance improvement, the "idealized" SRAM Tag design provides 24%, and our simple latency-optimized design provides 35%.

Keywords

This publication has 13 references indexed in Scilit:

Supporting Very Large DRAM Caches with Compound-Access Scheduling and MissMap
IEEE Micro, 2012
Enabling Efficient and Scalable Hybrid Memories Using Fine-Granularity DRAM Cache Management
IEEE Computer Architecture Letters, 2012
Efficiently enabling conventional block sizes for very large die-stacked DRAM caches
Published by Association for Computing Machinery (ACM) ,2011
SHiP
Published by Association for Computing Machinery (ACM) ,2011
Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller Support
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2010
CHOP: Adaptive filter-based DRAM caching for CMP server platforms
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2010
Optimizing communication and capacity in a 3D stacked reconfigurable cache hierarchy
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2009
Exploring DRAM cache architectures for CMP server platforms
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2007
Adaptive insertion policies for high performance caching
Published by Association for Computing Machinery (ACM) ,2007
Using SimPoint for accurate and efficient simulation
Published by Association for Computing Machinery (ACM) ,2003

Cited by 163 articles