FIGARO: Improving System Performance via Fine-Grained In-DRAM Data Relocation and Caching

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE) in 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

p. 313-328
https://doi.org/10.1109/micro50266.2020.00036

Abstract

Main memory, composed of DRAM, is a performance bottleneck for many applications, due to the high DRAM access latency. In-DRAM caches work to mitigate this latency by augmenting regular-latency DRAM with small-but-fast regions of DRAM that serve as a cache for the data held in the regular-latency (i.e., slow) region of DRAM. While an effective in-DRAM cache can allow a large fraction of memory requests to be served from a fast DRAM region, the latency savings are often hindered by inefficient mechanisms for migrating (i.e., relocating) copies of data into and out of the fast regions. Existing in-DRAM caches have two sources of inefficiency: (1) their data relocation granularity is an entire multi-kilobyte row of DRAM, even though much of the row may never be accessed due to poor data locality; and (2) because the relocation latency increases with the physical distance between the slow and fast regions, multiple fast regions are physically interleaved among slow regions to reduce the relocation latency, resulting in increased hardware area and manufacturing complexityWe propose a new substrate, FIGARO, that uses existing shared global buffers among subarrays within a DRAM bank to provide support for in-DRAM data relocation across subar-rays at the granularity of a single cache block. FIGARO has a distance-independent latency within a DRAM bank, and avoids complex modifications to DRAM (such as the interleaving of fast and slow regions). Using FIGARO, we design a fine-grained in-DRAM cache called FIGCache. The key idea of FIGCache is to cache only small, frequently-accessed portions of different DRAM rows in a designated region of DRAM. By caching only the parts of each row that are expected to be accessed in the near future, we can pack more of the frequently-accessed data into FIGCache, and can benefit from additional row hits in DRAM (i.e., accesses to an already-open row, which have a lower latency than accesses to an unopened row). FIGCache provides benefits for systems with both heterogeneous DRAM banks (i.e., banks with fast regions and slow regions) and conventional homogeneous DRAM banks (i.e., banks with only slow regions)Our evaluations across a wide variety of applications show that FIGCache improves the average performance of a system using DDR4 DRAM by 16.3% and reduces average DRAM energy consumption by 7.8% for 8-core workloads, over a conventional system without in-DRAM caching. We show that FIGCache outperforms state-of-the-art in-DRAM caching techniques, and that its performance gains are robust across many system and mechanism parameters.

Keywords

Funding Information

National University of Defense Technology

This publication has 105 references indexed in Scilit:

A scalable processing-in-memory accelerator for parallel graph processing
Published by Association for Computing Machinery (ACM) ,2015
Multiple clone row DRAM
Published by Association for Computing Machinery (ACM) ,2015
Fast Bulk Bitwise AND and OR in DRAM
IEEE Computer Architecture Letters, 2015
A 1.2 V 8 Gb 8-Channel 128 GB/s High-Bandwidth Memory (HBM) Stacked DRAM With Effective I/O Test Circuits
IEEE Journal of Solid-State Circuits, 2014
The efficacy of error mitigation techniques for DRAM retention failures
ACM SIGMETRICS Performance Evaluation Review, 2014
Reducing DRAM row activations with eager read/write clustering
ACM Transactions on Architecture and Code Optimization, 2013
Staged memory scheduling
ACM SIGARCH Computer Architecture News, 2012
Improving System Energy Efficiency with Memory Rank Subsetting
ACM Transactions on Architecture and Code Optimization, 2012
MediaBench II video: Expediting the next generation of video systems research
Microprocessors and Microsystems, 2009
Parallelism-Aware Batch Scheduling
ACM SIGARCH Computer Architecture News, 2008

Cited by 19 articles