Assessment of the Effect of Memory Page Retirement on System RAS Against Hardware Faults
- 21 July 2006
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 365-370
- https://doi.org/10.1109/dsn.2006.13
Abstract
The Solaris 10 operating system includes a number of new features for predictive self-healing. One such feature is the ability of the fault management software to diagnose memory errors and drive automatic memory page retirement (MPR), intended to reduce the negative impact of permanent memory faults that generate either correctable or uncorrectable errors on system reliability, availability, and serviceability (RAS). The MPR technique allows memory pages suffering from correctable errors and relocatable clean pages suffering from uncorrectable errors to be removed from use in the virtual memory system without interrupting user applications. It also allows relocatable dirty pages associated with uncorrectable errors to be isolated with limited impact on affected user processes, avoiding an outage for the entire system. This study applies analytical models, with parameters calibrated by field experience, to quantify the reduction that can be made by this operating system self-healing technique on the system interruptions, yearly downtime, and number of services introduced by hardware permanent faults, for typical low-end and mid-range server systems. The results show that significant improvements can be made on these three system RAS metrics by deploying the MPR capabilityKeywords
This publication has 6 references indexed in Scilit:
- Self-Healing in Modern Operating SystemsQueue, 2004
- Automating Software Failure ReportingQueue, 2004
- Hierarchical computation of interval availability and related metricsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2004
- Availability measurement and modeling for an application serverPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2004
- The dawning of the autonomic computing eraIBM Systems Journal, 2003
- Dynamic reconfiguration: Basic building blocks for autonomic computing on IBM pSeries serversIBM Systems Journal, 2003