Understanding the Influence of Configuration Settings: An Execution Model-Driven Framework for Apache Spark Platform
- 1 June 2017
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 802-807
- https://doi.org/10.1109/cloud.2017.119
Abstract
Apache Spark provides numerous configuration settings that can be tuned to improve the performance of specific applications running on the platform. However, due to its multi-stage execution model and high interactive complexity across nodes, it is nontrivial to understand how/why a specific setting influences the execution flow and performance. To address this challenge, we develop an execution model-driven framework that extracts key performance metrics relevant to different levels of execution (e.g., application level, stage level, task level, system level) and applies statistical analysis techniques to identify the key execution features that change significantly in response to changes in configuration settings. This allows users to answer questions such as "How does configuration setting X affect the execution behavior of Spark?" or "Why does changing configuration setting X degrade the performance of Spark application Y?". We tested our framework using 6 open source applications (e.g., Word Count, Tera Sort, KMeans, Matrix Factorization, PageRank, and Triangle Count) and demonstrated the effectiveness of our framework in identifying the underlying reasons behind changes in performance.Keywords
This publication has 5 references indexed in Scilit:
- CSMiner: An Automated Tool for Analyzing Changes in Configuration Settings across Multiple Versions of Large Scale Cloud SoftwarePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2016
- Modeling Interference for Apache Spark JobsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2016
- Performance-influence models for highly configurable systemsPublished by Association for Computing Machinery (ACM) ,2015
- Performance Prediction for Apache Spark PlatformPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- Performance Modeling to Divide Performance Interference of Virtualization and Virtual Machine CombinationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2014