Why Does the Cloud Stop Computing?
- 5 October 2016
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM)
Abstract
We conducted a cloud outage study (COS) of 32 popular Internet services. We analyzed 1247 headline news and public post-mortem reports that detail 597 unplanned outages that occurred within a 7-year span from 2009 to 2015. We analyzed outage duration, root causes, impacts, and fix procedures. This study reveals the broader availability landscape of modern cloud services and provides answers to why outages still take place even with pervasive redundancies.Keywords
This publication has 19 references indexed in Scilit:
- Failure Analysis of Jobs in Compute Clouds: A Google Cluster Case StudyPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2014
- Failure Analysis of Virtual and Physical Machines: Patterns, Causes and CharacteristicsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2014
- LimplockPublished by Association for Computing Machinery (ACM) ,2013
- An empirical study on configuration errors in commercial and open source systemsPublished by Association for Computing Machinery (ACM) ,2011
- PREFAILPublished by Association for Computing Machinery (ACM) ,2011
- Understanding network failures in data centersPublished by Association for Computing Machinery (ACM) ,2011
- An analysis of latent sector errors in disk drivesPublished by Association for Computing Machinery (ACM) ,2007
- BlueGene/L Failure Analysis and Prediction ModelsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2006
- A large-scale study of failures in high-performance computing systemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2006
- Basic concepts and taxonomy of dependable and secure computingIEEE Transactions on Dependable and Secure Computing, 2004