DARE
- 15 June 2015
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM) in Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing
Abstract
The increasing amount of data that needs to be collected and analyzed requires large-scale datacenter architectures that are naturally more susceptible to faults of single components. One way to offer consistent services on such unreliable systems are replicated state machines (RSMs). Yet, traditional RSM protocols cannot deliver the needed latency and request rates for future large-scale systems. In this paper, we propose a new set of protocols based on Remote Direct Memory Access (RDMA) primitives. To asses these mechanisms, we use a strongly consistent key-value store; the evaluation shows that our simple protocols improve RSM performance by more than an order of magnitude. Furthermore, we show that RDMA introduces various new options, such as log access management. Our protocols enable operators to fully utilize the new capabilities of the quickly growing number of RDMA-capable datacenter networks.Keywords
Funding Information
- Microsoft Research
This publication has 26 references indexed in Scilit:
- Minimum density RAID-6 codesACM Transactions on Storage, 2011
- DynamoACM SIGOPS Operating Systems Review, 2007
- Disk PaxosDistributed Computing, 2003
- Failure detection and consensus in the crash-recovery modelDistributed Computing, 2000
- The part-time parliamentACM Transactions on Computer Systems, 1998
- Unreliable failure detectors for reliable distributed systemsJournal of the ACM, 1996
- RAID: high-performance, reliable secondary storageACM Computing Surveys, 1994
- Implementing fault-tolerant services using the state machine approach: a tutorialACM Computing Surveys, 1990
- Impossibility of distributed consensus with one faulty processJournal of the ACM, 1985
- The implementation of reliable distributed multiprocess systemsComputer Networks (1976), 1978