iMapReduce: A Distributed Computing Framework for Iterative Computation

1 May 2011

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 1112-1121
https://doi.org/10.1109/ipdps.2011.260

Abstract

Relational data are pervasive in many applications such as data mining or social network analysis. These relational data are typically massive containing at least millions or hundreds of millions of relations. This poses demand for the design of distributed computing frameworks for processing these data on a large cluster. MapReduce is an example of such a framework. However, many relational data based applications typically require parsing the relational data iteratively and need to operate on these data through many iterations. MapReduce lacks built-in support for the iterative process. This paper presents iMapReduce, a framework that supports iterative processing. iMapReduce allows users to specify the iterative operations with map and reduce functions, while supporting the iterative processing automatically without the need of users' involvement. More importantly, iMapReduce significantly improves the performance of iterative algorithms by (1) reducing the overhead of creating a new task in every iteration, (2) eliminating the shuffling of the static data in the shuffle stage of MapReduce, and (3) allowing asynchronous execution of each iteration, i.e., an iteration can start before all tasks of a previous iteration have finished. We implement iMapReduce based on Apache Hadoop, and show that iMapReduce can achieve a factor of 1.2 to 5 speedup over those implemented on MapReduce for well-known iterative algorithms.

Keywords

This publication has 8 references indexed in Scilit:

Design patterns for efficient graph algorithms in MapReduce
Published by Association for Computing Machinery (ACM) ,2010
Twister
Published by Association for Computing Machinery (ACM) ,2010
PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2009
Power-Law Distributions in Empirical Data
Siam Review, 2009
Scalable proximity estimation and link prediction in online social networks
Published by Association for Computing Machinery (ACM) ,2009
User interactions in social networks and their implications
Published by Association for Computing Machinery (ACM) ,2009
Video suggestion and discovery for youtube
Published by Association for Computing Machinery (ACM) ,2008
Dryad
Published by Association for Computing Machinery (ACM) ,2007

Cited by 40 articles