HeMem

Open Access

26 October 2021

conference paper
conference paper
Published by Association for Computing Machinery (ACM)

https://doi.org/10.1145/3477132.3483550

Abstract

High-capacity non-volatile memory (NVM) is a new main memory tier. Tiered DRAM+NVM servers increase total memory capacity by up to 8x, but can diminish memory bandwidth by up to 7x and inflate latency by up to 63% if not managed well. We study existing hardware and software tiered memory management systems on the recently available Intel Optane DC NVM with big data applications and find that no existing system maximizes application performance on real NVM. Based on our findings, we present HeMem, a tiered main memory management system designed from scratch for commercially available NVM and the big data applications that use it. HeMem manages tiered memory asynchronously, batching and amortizing memory access tracking, migration, and associated TLB synchronization overheads. HeMem monitors application memory use by sampling memory access via CPU events, rather than page tables. This allows HeMem to scale to terabytes of memory, keeping small and ephemeral data structures in fast memory, and allocating scarce, asymmetric NVM bandwidth according to access patterns. Finally, HeMem is flexible by placing per-application memory management policy at user-level. On a system with Intel Optane DC NVM, HeMem outperforms hardware, OS, and PL-based tiered memory management, providing up to 50% runtime reduction for the GAP graph processing benchmark, 13% higher throughput for TPC-C on the Silo in-memory database, 16% lower tail-latency under performance isolation for a key-value store, and up to 10x less NVM wear than the next best solution, without application modification.

Keywords

Funding Information

NSF (National Science Foundation) (1719061,2008884)

This publication has 15 references indexed in Scilit:

Unimem
Published by Association for Computing Machinery (ACM) ,2017
Data tiering in heterogeneous memory systems
Published by Association for Computing Machinery (ACM) ,2016
High Performance Packet Processing with FlexNIC
Published by Association for Computing Machinery (ACM) ,2016
Exploiting Program Semantics to Place Data in Hybrid Memory
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
Locality Exists in Graph Processing: Workload Characterization on an Ivy Bridge Server
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
Profiling a warehouse-scale computer
Published by Association for Computing Machinery (ACM) ,2015
Speedy transactions in multicore in-memory databases
Published by Association for Computing Machinery (ACM) ,2013
Whare-map
Published by Association for Computing Machinery (ACM) ,2013
Workload analysis of a large-scale key-value store
Published by Association for Computing Machinery (ACM) ,2012
TPC-E vs. TPC-C
ACM SIGMOD Record, 2011

Cited by 27 articles