An Advanced MapReduce: Cloud MapReduce, Enhancements and Applications
- 25 April 2014
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Network and Service Management
- Vol. 11 (1), 101-115
- https://doi.org/10.1109/tnsm.2014.031714.130407
Abstract
Recently, Cloud Computing is attracting great attention due to its provision of configurable computing resources. MapReduce (MR) is a popular framework for data-intensive distributed computing of batch jobs. MapReduce suffers from the following drawbacks: 1. It is sequential in its processing of Map and Reduce Phases 2. Being cluster based, its scalability is relatively limited. 3. It does not support flexible pricing. 4. It does not support stream data processing. We describe Cloud MapReduce (CMR), which overcomes these limitations. Our results show that CMR is more efficient and runs faster than other implementations of the MR framework. In addition to this, we showcase how CMR can be further enhanced to: 1. Support stream data processing in addition to batch data by parallelizing the Map and Reduce phases through a pipelining model. 2. Support flexible pricing using Amazon Cloud's spot instances and to deal with massive machine terminations caused by spot price fluctuations. 3. Improve throughput and speed-up processing over traditional MR by more than 30% for large data sets. 4. Provide added flexibility and scalability by leveraging features of the cloud computing model. Click-stream analysis, real-time multimedia processing, time-sensitive analysis and other stream processing applications can also be supported.Keywords
This publication has 13 references indexed in Scilit:
- Optimizing Cloud MapReduce for Processing Stream Data Using PipeliningPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2011
- Multimedia Applications and Security in MapReduce: Opportunities and ChallengesConcurrency and Computation: Practice and Experience, 2011
- Reducing Costs of Spot Instances via Checkpointing in the Amazon Elastic Compute CloudPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2010
- NIR: Content based image retrieval on cloud computingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2009
- Understanding TCP incast throughput collapse in datacenter networksPublished by Association for Computing Machinery (ACM) ,2009
- Safe and effective fine-grained TCP retransmissions for datacenter communicationPublished by Association for Computing Machinery (ACM) ,2009
- Eventually consistentCommunications of the ACM, 2009
- A scalable, commodity data center network architecturePublished by Association for Computing Machinery (ACM) ,2008
- Evaluating MapReduce for Multi-core and Multiprocessor SystemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2007
- Dynamo (Panel Session)Published by Association for Computing Machinery (ACM) ,2000