Fast and Lightweight Execution Time Predictions for Spark Applications

Abstract

Users and operators of cloud-based Spark clusters often require quick insights on how the execution time of an application is likely to be impacted by the resources allocated to the application, e.g., the number of Spark executor cores assigned, and the size of the data to be processed. Existing techniques typically require extensive prior executions of the application under various resource allocation settings and data sizes to obtain an accurate model. In this paper, we explore the accuracy of a model with less prior executions of the application. Such a model can be useful for situations where quick predictions are required and little cluster resources are available for building a model. We use logs from two executions of an application with small sample data and different resource settings and explore the accuracy of the predictions for other resource allocation settings and input data sizes.

Keywords

This publication has 4 references indexed in Scilit:

Dynamic Configuration of Partitioning in Spark Applications
IEEE Transactions on Parallel and Distributed Systems, 2017
A Novel Method for Tuning Configuration Parameters of Spark Based on Machine Learning
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2016
Stage Aware Performance Modeling of DAG Based in Memory Analytic Platforms
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2016
Validity of the single processor approach to achieving large scale computing capabilities
Published by Association for Computing Machinery (ACM) ,1967

Cited by 8 articles