Soothsayer: Predicting Capacity Usage in Backup Storage Systems

Abstract
Protecting data from loss is of crucial significance to businesses. Failure to protect data can lead to heavy financial and strategic losses, that are often difficult to recover from. Thus, businesses employ backup techniques to store copies of data to enable failure recovery. But surprisingly, backups often fail. Our analysis of about 48,000 installations over a period of 3 years shows that one in six errors result from inadequate storage capacity and yet, little research has been done in the area of storage capacity forecasting that could mitigate these errors. In this paper, we propose Soothsayer, a simulation model that accurately predicts capacity usage by employing 3 techniques: autoregressive and moving-average modeling, clustering and stochastic modeling, and linear regression. Furthermore, our models provide a range of times when the capacity is likely to be reached, rather than a single point estimate, which is more beneficial in capacity planning. We evaluate the accuracy of our model using synthetic data as well as real world data. Our results show that our models outperform the previous piecewise regression method proposed by Chamness when applied to nonlinear datasets, while performing comparably when applied to linear datasets.

This publication has 3 references indexed in Scilit: