Unreliable failure detectors for reliable distributed systems
- 1 March 1996
- journal article
- Published by Association for Computing Machinery (ACM) in Journal of the ACM
- Vol. 43 (2), 225-267
- https://doi.org/10.1145/226643.226647
Abstract
We introduce the concept of unreliable failure detectors and study how they can be used to solve Consensus in asynchronous systems with crash failures. We characterise unreliable failure detectors in terms of two properties—completeness and accuracy. We show that Consensus can be solved even with unreliable failure detectors that make an infinite number of mistakes, and determine which ones can be used to solve Consensus despite any number of crashes, and which ones require a majority of correct processes. We prove that Consensus and Atomic Broadcast are reducible to each other in asynchronous systems with crash failures; thus, the above results also apply to Atomic Broadcast. A companion paper shows that one of the failure detectors introduced here is the weakest failure detector for solving Consensus [Chandra et al. 1992].Keywords
This publication has 17 references indexed in Scilit:
- Implementing fault-tolerant services using the state machine approach: a tutorialACM Computing Surveys, 1990
- Automatically increasing the fault-tolerance of distributed algorithmsJournal of Algorithms, 1990
- Preserving and using context information in interprocess communicationACM Transactions on Computer Systems, 1989
- Reliable scheduling in a TMR database systemACM Transactions on Computer Systems, 1989
- Consensus in the presence of partial synchronyJournal of the ACM, 1988
- Reliable communication in the presence of failuresACM Transactions on Computer Systems, 1987
- On the minimal synchronism needed for distributed consensusJournal of the ACM, 1987
- Reaching approximate agreement in the presence of faultsJournal of the ACM, 1986
- Asynchronous consensus and broadcast protocolsJournal of the ACM, 1985
- Reliable broadcast protocolsACM Transactions on Computer Systems, 1984