Kangaroo

Open Access

26 October 2021

conference paper
conference paper
Published by Association for Computing Machinery (ACM)

https://doi.org/10.1145/3477132.3483568

Abstract

Many social-media and IoT services have very large working sets consisting of billions of tiny (≈100 B) objects. Large, flash-based caches are important to serving these working sets at acceptable monetary cost. However, caching tiny objects on flash is challenging for two reasons: (i) SSDs can read/write data only in multi-KB "pages" that are much larger than a single object, stressing the limited number of times flash can be written; and (ii) very few bits per cached object can be kept in DRAM without losing flash's cost advantage. Unfortunately, existing flash-cache designs fall short of addressing these challenges: write-optimized designs require too much DRAM, and DRAM-optimized designs require too many flash writes. We present Kangaroo, a new flash-cache design that optimizes both DRAM usage and flash writes to maximize cache performance while minimizing cost. Kangaroo combines a large, set-associative cache with a small, log-structured cache. The set-associative cache requires minimal DRAM, while the log-structured cache minimizes Kangaroo's flash writes. Experiments using traces from Facebook and Twitter show that Kangaroo achieves DRAM usage close to the best prior DRAM-optimized design, flash writes close to the best prior write-optimized design, and miss ratios better than both. Kangaroo's design is Pareto-optimal across a range of allowed write rates, DRAM sizes, and flash sizes, reducing misses by 29% over the state of the art. These results are corroborated with a test deployment of Kangaroo in a production flash cache at Facebook.

Keywords

This publication has 19 references indexed in Scilit:

Whirlpool
Published by Association for Computing Machinery (ACM) ,2016
Talus: A simple way to remove cliffs in cache performance
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
Exact analysis of TTL cache networks
Performance Evaluation, 2014
Performance evaluation of the random replacement policy for networks of caches
ACM SIGMETRICS Performance Evaluation Review, 2012
SILT
Published by Association for Computing Machinery (ACM) ,2011
Performance of the move-to-front algorithm with Markov-modulated request sequences
Operations Research Letters, 1999
A comparison of trace-sampling techniques for multi-megabyte caches
International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1994
The design and implementation of a log-structured file system
Published by Association for Computing Machinery (ACM) ,1991
An approximate analysis of the LRU and FIFO buffer replacement schemes
Published by Association for Computing Machinery (ACM) ,1990
Principles of Optimal Page Replacement
Journal of the ACM, 1971

Cited by 9 articles