Aloe: verifying reliability of approximate programs in the presence of recovery mechanisms
- 22 February 2020
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM) in Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization
Abstract
Modern hardware is becoming increasingly susceptible to silent data corruptions. As general methods for detection and recovery from errors are time and energy consuming, selective detection and recovery are promising alternatives for applications that have the freedom to produce results with a variable level of accuracy. Several programming languages have provided specialized constructs for expressing detection and recovery operations, but the existing static analyses of safety and quantitative analyses of programs do not have the proper support for such language constructs. This work presents Aloe, a quantitative static analysis of reliability of programs with recovery blocks - a construct that checks for errors, and if necessary, applies the corresponding recovery strategy. The analysis supports reasoning about both reliable and potentially unreliable detection and recovery mechanisms. It implements a novel precondition generator for recovery blocks, built on top of Rely, a state-of-the-art quantitative reliability analysis for imperative programs. Aloe can reason about programs with scalar and array expressions, if-then-else conditionals, and bounded loops without early exits. The analyzed computation is idempotent and the recovery code re-executes the original computation. We implemented Aloe and applied it to a set of eight programs previously used in approximate computing research. Our results present significantly higher reliability and scale better compared to the existing Rely analysis. Moreover, the end-to-end accuracy of the verified computations exhibits only small accuracy losses.Keywords
Funding Information
- Defense Advanced Research Projects Agency (HR0011-18-C-0122)
- National Science Foundation (CCF-1629431,CCF-1703637,CCF-1846354)
This publication has 29 references indexed in Scilit:
- CLEARPublished by Association for Computing Machinery (ACM) ,2016
- Approximate computation with outlier detection in TopazPublished by Association for Computing Machinery (ACM) ,2015
- Probability type inference for flexible approximate programmingPublished by Association for Computing Machinery (ACM) ,2015
- ChiselPublished by Association for Computing Machinery (ACM) ,2014
- Verifying quantitative reliability for programs that execute on unreliable hardwarePublished by Association for Computing Machinery (ACM) ,2013
- Static analysis and compiler design for idempotent processingPublished by Association for Computing Machinery (ACM) ,2012
- Low-cost program-level detectors for reducing silent data corruptionsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2012
- Probabilistic accuracy bounds for fault-tolerant computations that discard tasksPublished by Association for Computing Machinery (ACM) ,2006
- Designing Reliable Systems from Unreliable Components: The Challenges of Transistor Variability and DegradationIEEE Micro, 2005
- Detailed design and evaluation of redundant multithreading alternativesACM SIGARCH Computer Architecture News, 2002