PeerReview

14 October 2007

journal article
Published by Association for Computing Machinery (ACM) in ACM SIGOPS Operating Systems Review

Vol. 41 (6), 175-188
https://doi.org/10.1145/1323293.1294279

Abstract

We describe PeerReview, a system that provides accountability in distributed systems. PeerReview ensures that Byzantine faults whose effects are observed by a correct node are eventually detected and irrefutably linked to a faulty node. At the same time, PeerReview ensures that a correct node can always defend itself against false accusations. These guarantees are particularly important for systems that span multiple administrative domains, which may not trust each other.PeerReview works by maintaining a secure record of the messages sent and received by each node. The record isused to automatically detect when a node's behavior deviates from that of a given reference implementation, thus exposing faulty nodes. PeerReview is widely applicable: it only requires that a correct node's actions are deterministic, that nodes can sign messages, and that each node is periodically checked by a correct node. We demonstrate that PeerReview is practical by applying it to three different types of distributed systems: a network filesystem, a peer-to-peer system, and an overlay multicast system.

Keywords

This publication has 31 references indexed in Scilit:

BASE
ACM Transactions on Computer Systems, 2003
Byzantine Fault Detectors for Solving Consensus
The Computer Journal, 2003
Practical byzantine fault tolerance and proactive recovery
ACM Transactions on Computer Systems, 2002
Simplifying fault-tolerance
Journal of the ACM, 2001
Unreliable failure detectors for reliable distributed systems
Journal of the ACM, 1996
Implementing fault-tolerant services using the state machine approach: a tutorial
ACM Computing Surveys, 1990
A compiler that increases the fault tolerance of asynchronous protocols
International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1988
Simulating authenticated broadcasts to derive simple fault-tolerant algorithms
Distributed Computing, 1987
Asynchronous consensus and broadcast protocols
Journal of the ACM, 1985
Using Time Instead of Timeout for Fault-Tolerant Distributed Systems.
ACM Transactions on Programming Languages and Systems, 1984

Cited by 72 articles