An Advanced MapReduce: Cloud MapReduce, Enhancements and Applications

Abstract
Recently, Cloud Computing is attracting great attention due to its provision of configurable computing resources. MapReduce (MR) is a popular framework for data-intensive distributed computing of batch jobs. MapReduce suffers from the following drawbacks: 1. It is sequential in its processing of Map and Reduce Phases 2. Being cluster based, its scalability is relatively limited. 3. It does not support flexible pricing. 4. It does not support stream data processing. We describe Cloud MapReduce (CMR), which overcomes these limitations. Our results show that CMR is more efficient and runs faster than other implementations of the MR framework. In addition to this, we showcase how CMR can be further enhanced to: 1. Support stream data processing in addition to batch data by parallelizing the Map and Reduce phases through a pipelining model. 2. Support flexible pricing using Amazon Cloud's spot instances and to deal with massive machine terminations caused by spot price fluctuations. 3. Improve throughput and speed-up processing over traditional MR by more than 30% for large data sets. 4. Provide added flexibility and scalability by leveraging features of the cloud computing model. Click-stream analysis, real-time multimedia processing, time-sensitive analysis and other stream processing applications can also be supported.

This publication has 13 references indexed in Scilit: