Aloe: verifying reliability of approximate programs in the presence of recovery mechanisms

conference paper
conference paper
Published by Association for Computing Machinery (ACM) in Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization

https://doi.org/10.1145/3368826.3377924

Abstract

Modern hardware is becoming increasingly susceptible to silent data corruptions. As general methods for detection and recovery from errors are time and energy consuming, selective detection and recovery are promising alternatives for applications that have the freedom to produce results with a variable level of accuracy. Several programming languages have provided specialized constructs for expressing detection and recovery operations, but the existing static analyses of safety and quantitative analyses of programs do not have the proper support for such language constructs. This work presents Aloe, a quantitative static analysis of reliability of programs with recovery blocks - a construct that checks for errors, and if necessary, applies the corresponding recovery strategy. The analysis supports reasoning about both reliable and potentially unreliable detection and recovery mechanisms. It implements a novel precondition generator for recovery blocks, built on top of Rely, a state-of-the-art quantitative reliability analysis for imperative programs. Aloe can reason about programs with scalar and array expressions, if-then-else conditionals, and bounded loops without early exits. The analyzed computation is idempotent and the recovery code re-executes the original computation. We implemented Aloe and applied it to a set of eight programs previously used in approximate computing research. Our results present significantly higher reliability and scale better compared to the existing Rely analysis. Moreover, the end-to-end accuracy of the verified computations exhibits only small accuracy losses.

Keywords

Funding Information

Defense Advanced Research Projects Agency (HR0011-18-C-0122)
National Science Foundation (CCF-1629431,CCF-1703637,CCF-1846354)

This publication has 29 references indexed in Scilit:

CLEAR
Published by Association for Computing Machinery (ACM) ,2016
Approximate computation with outlier detection in Topaz
Published by Association for Computing Machinery (ACM) ,2015
Probability type inference for flexible approximate programming
Published by Association for Computing Machinery (ACM) ,2015
Chisel
Published by Association for Computing Machinery (ACM) ,2014
Verifying quantitative reliability for programs that execute on unreliable hardware
Published by Association for Computing Machinery (ACM) ,2013
Static analysis and compiler design for idempotent processing
Published by Association for Computing Machinery (ACM) ,2012
Low-cost program-level detectors for reducing silent data corruptions
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2012
Probabilistic accuracy bounds for fault-tolerant computations that discard tasks
Published by Association for Computing Machinery (ACM) ,2006
Designing Reliable Systems from Unreliable Components: The Challenges of Transistor Variability and Degradation
IEEE Micro, 2005
Detailed design and evaluation of redundant multithreading alternatives
ACM SIGARCH Computer Architecture News, 2002

Cited by 7 articles