Performance Evaluation of Intel Optane Memory for Managed Workloads

22 April 2021

journal article
research article
Published by Association for Computing Machinery (ACM) in ACM Transactions on Architecture and Code Optimization

Vol. 18 (3), 1-26
https://doi.org/10.1145/3451342

Abstract

Intel Optane memory offers non-volatility, byte addressability, and high capacity. It suits managed workloads that prefer large main memory heaps. We investigate Optane as the main memory for managed (Java) workloads, focusing on performance scalability. As the workload (core count) increases, we note Optane’s performance relative to DRAM. A few workloads incur a slight slowdown on Optane memory, which helps conserve limited DRAM capacity. Unfortunately, other workloads scale poorly beyond a few core counts. This article investigates scaling bottlenecks for Java workloads on Optane memory, analyzing the application, runtime, and microarchitectural interactions. Poorly scaling workloads allocate objects rapidly and access objects in Optane memory frequently. These characteristics slow down the mutator and substantially slow down garbage collection (GC). At the microarchitecture level, load, store, and instruction miss penalties rise. To regain performance, we partition heaps across DRAM and Optane memory, a hybrid that scales considerably better than Optane alone. We exploit state-of-the-art GC approaches to partition heaps. Unfortunately, existing GC approaches needlessly waste DRAM capacity because they ignore runtime behavior. This article also introduces performance impact-guided memory allocation (PIMA) for hybrid memories. PIMA maximizes Optane utilization, allocating in DRAM only if it improves performance. It estimates the performance impact of allocating heaps in either memory type by sampling. We target PIMA at graph analytics workloads, offering a novel performance estimation method and detailed evaluation. PIMA identifies workload phases that benefit from DRAM with high (94.33%) accuracy, incurring only a 2% sampling overhead. PIMA operates stand-alone or combines with prior approaches to offer new performance versus DRAM capacity trade-offs. This work opens up Optane memory to a real-life role as the main memory for Java workloads.

Keywords

This publication has 35 references indexed in Scilit:

Mojim
Published by Association for Computing Machinery (ACM) ,2015
System software for persistent memory
Published by Association for Computing Machinery (ACM) ,2014
Bottle graphs
ACM SIGPLAN Notices, 2013
Die-stacked DRAM caches for servers
Published by Association for Computing Machinery (ACM) ,2013
A study of the scalability of stop-the-world garbage collectors on multicores
Published by Association for Computing Machinery (ACM) ,2013
Scheduling heterogeneous multi-cores through performance impact estimation (PIE)
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2012
Distributed GraphLab
Proceedings of the VLDB Endowment, 2012
NV-Heaps
ACM SIGPLAN Notices, 2011
Mnemosyne
ACM SIGPLAN Notices, 2011
The DaCapo benchmarks
Published by Association for Computing Machinery (ACM) ,2006

Cited by 9 articles