SparkBench

6 May 2015

conference paper
conference paper
Published by Association for Computing Machinery (ACM)

https://doi.org/10.1145/2742854.2747283

Abstract

Spark has been increasingly adopted by industries in recent years for big data analysis by providing a fault tolerant, scalable and easy-to-use in memory abstraction. Moreover, the community has been actively developing a rich ecosystem around Spark, making it even more attractive. However, there is not yet a Spark specify benchmark existing in the literature to guide the development and cluster deployment of Spark to better fit resource demands of user applications. In this paper, we present SparkBench, a Spark specific benchmarking suite, which includes a comprehensive set of applications. SparkBench covers four main categories of applications, including machine learning, graph computation, SQL query and streaming applications. We also characterize the resource consumption, data flow and timing information of each application and evaluate the performance impact of a key configuration parameter to guide the design and optimization of Spark data analytic platform.

Keywords

This publication has 13 references indexed in Scilit:

BigDataBench: A big data benchmark suite from internet services
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2014
A characterization of big data benchmarks
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2013
BigBench
Published by Association for Computing Machinery (ACM) ,2013
An Introduction to Statistical Learning
Published by Springer Science and Business Media LLC ,2013
Clearing the clouds
Published by Association for Computing Machinery (ACM) ,2012
Efficient Triangle Counting in Large Graphs via Degree-Based Vertex Partitioning
Internet Mathematics, 2012
Benchmarking cloud serving systems with YCSB
Published by Association for Computing Machinery (ACM) ,2010
A comparison of approaches to large-scale data analysis
Published by Association for Computing Machinery (ACM) ,2009
Factorization meets the neighborhood
Published by Association for Computing Machinery (ACM) ,2008
MapReduce
Communications of the ACM, 2008

Cited by 101 articles