Understanding the Influence of Configuration Settings: An Execution Model-Driven Framework for Apache Spark Platform

1 June 2017

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 802-807
https://doi.org/10.1109/cloud.2017.119

Abstract

Apache Spark provides numerous configuration settings that can be tuned to improve the performance of specific applications running on the platform. However, due to its multi-stage execution model and high interactive complexity across nodes, it is nontrivial to understand how/why a specific setting influences the execution flow and performance. To address this challenge, we develop an execution model-driven framework that extracts key performance metrics relevant to different levels of execution (e.g., application level, stage level, task level, system level) and applies statistical analysis techniques to identify the key execution features that change significantly in response to changes in configuration settings. This allows users to answer questions such as "How does configuration setting X affect the execution behavior of Spark?" or "Why does changing configuration setting X degrade the performance of Spark application Y?". We tested our framework using 6 open source applications (e.g., Word Count, Tera Sort, KMeans, Matrix Factorization, PageRank, and Triangle Count) and demonstrated the effectiveness of our framework in identifying the underlying reasons behind changes in performance.

Keywords

This publication has 5 references indexed in Scilit:

CSMiner: An Automated Tool for Analyzing Changes in Configuration Settings across Multiple Versions of Large Scale Cloud Software
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2016
Modeling Interference for Apache Spark Jobs
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2016
Performance-influence models for highly configurable systems
Published by Association for Computing Machinery (ACM) ,2015
Performance Prediction for Apache Spark Platform
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
Performance Modeling to Divide Performance Interference of Virtualization and Virtual Machine Combination
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2014

Cited by 12 articles