PeerReview
- 14 October 2007
- journal article
- Published by Association for Computing Machinery (ACM) in ACM SIGOPS Operating Systems Review
- Vol. 41 (6), 175-188
- https://doi.org/10.1145/1323293.1294279
Abstract
We describe PeerReview, a system that provides accountability in distributed systems. PeerReview ensures that Byzantine faults whose effects are observed by a correct node are eventually detected and irrefutably linked to a faulty node. At the same time, PeerReview ensures that a correct node can always defend itself against false accusations. These guarantees are particularly important for systems that span multiple administrative domains, which may not trust each other.PeerReview works by maintaining a secure record of the messages sent and received by each node. The record isused to automatically detect when a node's behavior deviates from that of a given reference implementation, thus exposing faulty nodes. PeerReview is widely applicable: it only requires that a correct node's actions are deterministic, that nodes can sign messages, and that each node is periodically checked by a correct node. We demonstrate that PeerReview is practical by applying it to three different types of distributed systems: a network filesystem, a peer-to-peer system, and an overlay multicast system.Keywords
This publication has 31 references indexed in Scilit:
- BASEACM Transactions on Computer Systems, 2003
- Byzantine Fault Detectors for Solving ConsensusThe Computer Journal, 2003
- Practical byzantine fault tolerance and proactive recoveryACM Transactions on Computer Systems, 2002
- Simplifying fault-toleranceJournal of the ACM, 2001
- Unreliable failure detectors for reliable distributed systemsJournal of the ACM, 1996
- Implementing fault-tolerant services using the state machine approach: a tutorialACM Computing Surveys, 1990
- A compiler that increases the fault tolerance of asynchronous protocolsInternational Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1988
- Simulating authenticated broadcasts to derive simple fault-tolerant algorithmsDistributed Computing, 1987
- Asynchronous consensus and broadcast protocolsJournal of the ACM, 1985
- Using Time Instead of Timeout for Fault-Tolerant Distributed Systems.ACM Transactions on Programming Languages and Systems, 1984