Optimizing Checkpoints Using NVM as Virtual Memory
- 1 May 2013
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE) in 2013 IEEE 27th International Symposium on Parallel and Distributed Processing
Abstract
Rapid checkpointing will remain key functionality for next generation high end machines. This paper explores the use of node-local nonvolatile memories (NVM) such as phase-change memory, to provide frequent, low overhead checkpoints. By adapting existing multi-level checkpoint techniques, we devise new methods, termed NVM-checkpoints, that efficiently store checkpoints on both local and remote node NVM. The checkpoint frequencies are guided by failure models that capture the expected accessibility of such data after failure. To lower overheads, NVM-checkpoints reduce the NVM and interconnect bandwidth used with a novel pre-copy mechanism, which incrementally moves checkpoint data from DRAM to NVM before a local checkpoint is started. This reduces local checkpoint cost by limiting the instantaneous data volume moved at checkpoint time, thereby freeing bandwidth for use by applications. In fact, the pre-copy method can reduce peak interconnect usage up to 46%. Since our approach treats NVM as memory rather than as 'Ramdisk', pre-copying can be generalized to directly move data to remote NVMs. This results in 40% faster application execution times compared to asynchronous approaches not using pre-copying.Keywords
This publication has 16 references indexed in Scilit:
- Identifying Opportunities for Byte-Addressable Non-Volatile Memory in Extreme-Scale Scientific ApplicationsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2012
- Hybrid checkpointing using emerging nonvolatile memories for future exascale systemsACM Transactions on Architecture and Code Optimization, 2011
- MnemosynePublished by Association for Computing Machinery (ACM) ,2011
- NV-HeapsPublished by Association for Computing Machinery (ACM) ,2011
- Moneta: A High-Performance Storage Array Architecture for Next-Generation, Non-volatile MemoriesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2010
- PLFSPublished by Association for Computing Machinery (ACM) ,2009
- Better I/O through byte-addressable, persistent memoryPublished by Association for Computing Machinery (ACM) ,2009
- Architecting phase change memory as a scalable dram alternativePublished by Association for Computing Machinery (ACM) ,2009
- Integrated Performance Monitoring of a Cosmology Application on Leading HEC PlatformsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Diskless checkpointingIEEE Transactions on Parallel and Distributed Systems, 1998