Active Replication at (Almost) No Cost
- 1 October 2011
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
MapReduce has become a popular programming paradigm in the domain of batch processing systems. Its simplicity allows applications to be highly scalable and to be easily deployed on large clusters. More recently, the MapReduce approach has been also applied to Event Stream Processing (ESP) systems. This approach, which we call StreamMapReduce, enabled many novel applications that require both scalability and low latency. Another recent trend is to move distributed applications to public clouds such as Amazon EC2 rather than running and maintaining private data centers. Most cloud providers charge their customers on an hourly basis rather than on CPU cycles consumed. However, many applications, especially those that process online data, need to limit their CPU utilization to conservative levels (often as low as 50%) to be able to accommodate natural and sudden load variations without causing unacceptable deterioration in responsiveness. In this paper, we present a new fault tolerance approach based on active replication for StreamMapReduce systems. This approach is cost effective for cloud consumers as well as cloud providers. Cost effectiveness is achieved by fully utilizing the acquired computational resources without performance degradation and by reducing the need for additional nodes dedicated to fault tolerance.Keywords
This publication has 15 references indexed in Scilit:
- Low-Overhead Fault Tolerance for High-Throughput Data Processing SystemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2011
- Thinking Clearly about PerformanceQueue, 2010
- Data in flightCommunications of the ACM, 2010
- A Hybrid Approach to High Availability in Stream Processing SystemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2010
- Multithreading-Enabled Active Replication for Event Stream Processing OperatorsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2009
- Highly available, fault-tolerant, parallel dataflowsPublished by Association for Computing Machinery (ACM) ,2004
- Flux: an adaptive partitioning operator for continuous query systemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2004
- Deterministic scheduling for transactional multithreaded replicasPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2000
- Implementing fault-tolerant services using the state machine approach: a tutorialACM Computing Surveys, 1990
- A comparison of high-availability media recovery techniquesPublished by Association for Computing Machinery (ACM) ,1989