Efficient Testing of Recovery Code Using Fault Injection
- 1 December 2011
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Computer Systems
- Vol. 29 (4), 1-38
- https://doi.org/10.1145/2063509.2063511
Abstract
A critical part of developing a reliable software system is testing its recovery code. This code is traditionally difficult to test in the lab, and, in the field, it rarely gets to run; yet, when it does run, it must execute flawlessly in order to recover the system from failure. In this article, we present a library-level fault injection engine that enables the productive use of fault injection for software testing. We describe automated techniques for reliably identifying errors that applications may encounter when interacting with their environment, for automatically identifying high-value injection targets in program binaries, and for producing efficient injection test scenarios. We present a framework for writing precise triggers that inject desired faults, in the form of error return codes and corresponding side effects, at the boundary between applications and libraries. These techniques are embodied in LFI, a new fault injection engine we are distributing http://lfi.epfl.ch. This article includes a report of our initial experience using LFI. Most notably, LFI found 12 serious, previously unreported bugs in the MySQL database server, Git version control system, BIND name server, Pidgin IM client, and PBFT replication system with no developer assistance and no access to source code. LFI also increased recovery-code coverage from virtually zero up to 60% entirely automatically without requiring new tests or human involvement.Keywords
This publication has 16 references indexed in Scilit:
- Exceptional situations and program reliabilityACM Transactions on Programming Languages and Systems, 2008
- AppleScriptPublished by Association for Computing Machinery (ACM) ,2007
- Verification and Validation of (Real Time) COTS Products using Fault Injection TechniquesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2007
- IRON file systemsPublished by Association for Computing Machinery (ACM) ,2005
- Testing of java web services for robustnessPublished by Association for Computing Machinery (ACM) ,2004
- How software engineers use documentation: the state of the practiceIEEE Software, 2003
- The Ariane 5 software failureACM SIGSOFT Software Engineering Notes, 1997
- FERRARI: a flexible software-based fault and error injection systemIEEE Transactions on Computers, 1995
- Fault injection experiments using FIATIEEE Transactions on Computers, 1990
- Fault injection for dependability validation: a methodology and some applicationsIEEE Transactions on Software Engineering, 1990