Fast and Lightweight Execution Time Predictions for Spark Applications

Abstract
Users and operators of cloud-based Spark clusters often require quick insights on how the execution time of an application is likely to be impacted by the resources allocated to the application, e.g., the number of Spark executor cores assigned, and the size of the data to be processed. Existing techniques typically require extensive prior executions of the application under various resource allocation settings and data sizes to obtain an accurate model. In this paper, we explore the accuracy of a model with less prior executions of the application. Such a model can be useful for situations where quick predictions are required and little cluster resources are available for building a model. We use logs from two executions of an application with small sample data and different resource settings and explore the accuracy of the predictions for other resource allocation settings and input data sizes.

This publication has 4 references indexed in Scilit: