The Case for Evaluating MapReduce Performance Using Workload Suites
Top Cited Papers
- 1 July 2011
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 390-399
- https://doi.org/10.1109/mascots.2011.12
Abstract
MapReduce systems face enormous challenges due to increasing growth, diversity, and consolidation of the data and computation involved. Provisioning, configuring, and managing large-scale MapReduce clusters require realistic, workload-specific performance insights that existing MapReduce benchmarks are ill-equipped to supply. In this paper, we build the case for going beyond benchmarks for MapReduce performance evaluations. We analyze and compare two production MapReduce traces to develop a vocabulary for describing MapReduce workloads. We show that existing benchmarks fail to capture rich workload characteristics observed in traces, and propose a framework to synthesize and execute representative workloads. We demonstrate that performance evaluations using realistic workloads gives cluster operator new ways to identify workload-specific resource bottlenecks, and workload-specific choice of MapReduce task schedulers. We expect that once available, workload suites would allow cluster operators to accomplish previously challenging tasks beyond what we can now imagine, thus serving as a useful tool to help design and manage MapReduce systems.Keywords
This publication has 9 references indexed in Scilit:
- Energy Management for MapReduce ClustersProceedings of the VLDB Endowment, 2010
- Benchmarking cloud serving systems with YCSBPublished by Association for Computing Machinery (ACM) ,2010
- ParaTimerPublished by Association for Computing Machinery (ACM) ,2010
- Delay schedulingPublished by Association for Computing Machinery (ACM) ,2010
- On the energy (in)efficiency of Hadoop clustersACM SIGOPS Operating Systems Review, 2010
- Statistics-driven workload modeling for the CloudPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2010
- A comparison of approaches to large-scale data analysisPublished by Association for Computing Machinery (ACM) ,2009
- MapReduceCommunications of the ACM, 2008
- All of StatisticsSpringer Texts in Statistics, 2004