Memory or Time: Performance Evaluation for Iterative Operation on Hadoop and Spark

1 November 2013

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 721-727
https://doi.org/10.1109/hpcc.and.euc.2013.106

Abstract

Hadoop is a very popular general purpose framework for many different classes of data-intensive applications. However, it is not good for iterative operations because of the cost paid for the data reloading from disk at each iteration. As an emerging framework, Spark, which is designed to have a global cache mechanism, can achieve better performance in response time since the in-memory access over the distributed machines of cluster will proceed during the entire iterative process. Although the performance on time has been evaluated for Spark over Hadoop, the memory consumption, another system performance criteria, is not deeply analyzed in the literature. In this work, we conducted extensive experiments for iterative operations to compare the performance in both time and memory cost between Hadoop and Spark. We found that although Spark is in general faster than Hadoop in iterative operations, it has to pay for more memory consumption. Also, its speed advantage is weakened at the moment when the memory is not sufficient enough to store newly created intermediate results.

Keywords

This publication has 8 references indexed in Scilit:

Distributed GraphLab
Proceedings of the VLDB Endowment, 2012
Parallel data processing with MapReduce
ACM SIGMOD Record, 2012
iMapReduce: A Distributed Computing Framework for Iterative Computation
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2011
The performance of MapReduce
Proceedings of the VLDB Endowment, 2010
HaLoop
Proceedings of the VLDB Endowment, 2010
Twister
Published by Association for Computing Machinery (ACM) ,2010
Data-Intensive Text Processing with MapReduce
Synthesis Lectures on Human Language Technologies, 2010
Inside PageRank
ACM Transactions on Internet Technology, 2005

Cited by 77 articles