iMapReduce: A Distributed Computing Framework for Iterative Computation
- 1 May 2011
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 1112-1121
- https://doi.org/10.1109/ipdps.2011.260
Abstract
Relational data are pervasive in many applications such as data mining or social network analysis. These relational data are typically massive containing at least millions or hundreds of millions of relations. This poses demand for the design of distributed computing frameworks for processing these data on a large cluster. MapReduce is an example of such a framework. However, many relational data based applications typically require parsing the relational data iteratively and need to operate on these data through many iterations. MapReduce lacks built-in support for the iterative process. This paper presents iMapReduce, a framework that supports iterative processing. iMapReduce allows users to specify the iterative operations with map and reduce functions, while supporting the iterative processing automatically without the need of users' involvement. More importantly, iMapReduce significantly improves the performance of iterative algorithms by (1) reducing the overhead of creating a new task in every iteration, (2) eliminating the shuffling of the static data in the shuffle stage of MapReduce, and (3) allowing asynchronous execution of each iteration, i.e., an iteration can start before all tasks of a previous iteration have finished. We implement iMapReduce based on Apache Hadoop, and show that iMapReduce can achieve a factor of 1.2 to 5 speedup over those implemented on MapReduce for well-known iterative algorithms.Keywords
This publication has 8 references indexed in Scilit:
- Design patterns for efficient graph algorithms in MapReducePublished by Association for Computing Machinery (ACM) ,2010
- TwisterPublished by Association for Computing Machinery (ACM) ,2010
- PEGASUS: A Peta-Scale Graph Mining System Implementation and ObservationsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2009
- Power-Law Distributions in Empirical DataSiam Review, 2009
- Scalable proximity estimation and link prediction in online social networksPublished by Association for Computing Machinery (ACM) ,2009
- User interactions in social networks and their implicationsPublished by Association for Computing Machinery (ACM) ,2009
- Video suggestion and discovery for youtubePublished by Association for Computing Machinery (ACM) ,2008
- DryadPublished by Association for Computing Machinery (ACM) ,2007