Fundamental Latency Trade-off in Architecting DRAM Caches: Outperforming Impractical SRAM-Tags with a Simple and Practical Design
Top Cited Papers
- 1 December 2012
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 235-246
- https://doi.org/10.1109/micro.2012.30
Abstract
This paper analyzes the design trade-offs in architecting large-scale DRAM caches. Prior research, including the recent work from Loh and Hill, have organized DRAM caches similar to conventional caches. In this paper, we contend that some of the basic design decisions typically made for conventional caches (such as serialization of tag and data access, large associativity, and update of replacement state) are detrimental to the performance of DRAM caches, as they exacerbate the already high hit latency. We show that higher performance can be obtained by optimizing the DRAM cache architecture first for latency, and then for hit rate. We propose a latency-optimized cache architecture, called Alloy Cache, that eliminates the delay due to tag serialization by streaming tag and data together in a single burst. We also propose a simple and highly effective Memory Access Predictor that incurs a storage overhead of 96 bytes per core and a latency of 1 cycle. It helps service cache misses faster without the need to wait for a cache miss detection in the common case. Our evaluations show that our latency-optimized cache design significantly outperforms both the recent proposal from Loh and Hill, as well as an impractical SRAM Tag-Store design that incurs an unacceptable overhead of several tens of megabytes. On average, the proposal from Loh and Hill provides 8.7% performance improvement, the "idealized" SRAM Tag design provides 24%, and our simple latency-optimized design provides 35%.Keywords
This publication has 13 references indexed in Scilit:
- Supporting Very Large DRAM Caches with Compound-Access Scheduling and MissMapIEEE Micro, 2012
- Enabling Efficient and Scalable Hybrid Memories Using Fine-Granularity DRAM Cache ManagementIEEE Computer Architecture Letters, 2012
- Efficiently enabling conventional block sizes for very large die-stacked DRAM cachesPublished by Association for Computing Machinery (ACM) ,2011
- SHiPPublished by Association for Computing Machinery (ACM) ,2011
- Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller SupportPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2010
- CHOP: Adaptive filter-based DRAM caching for CMP server platformsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2010
- Optimizing communication and capacity in a 3D stacked reconfigurable cache hierarchyPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2009
- Exploring DRAM cache architectures for CMP server platformsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2007
- Adaptive insertion policies for high performance cachingPublished by Association for Computing Machinery (ACM) ,2007
- Using SimPoint for accurate and efficient simulationPublished by Association for Computing Machinery (ACM) ,2003