Surviving switch failures in cloud datacenters
- 11 April 2021
- journal article
- research article
- Published by Association for Computing Machinery (ACM) in ACM SIGCOMM Computer Communication Review
- Vol. 51 (2), 2-9
- https://doi.org/10.1145/3464994.3464996
Abstract
Switch failures can hamper access to client services, cause link congestion and blackhole network traffic. In this study, we examine the nature of switch failures in the datacenters of a large commercial cloud provider through the lens of survival theory. We study a cohort of over 180,000 switches with a variety of hardware and software configurations and find that datacenter switches have a 98% likelihood of functioning uninterrupted for over 3 months since deployment in production. However, there is significant heterogeneity in switch survival rates with respect to their hardware and software: the switches of one vendor are twice as likely to fail compared to the others. We attribute the majority of switch failures to hardware impairments and unplanned power losses. We find that the in-house switch operating system, SONiC, boosts the survival likelihood of switches in datacenters by 1% by eliminating switch failures caused by software bugs in vendor switch OSes.Keywords
This publication has 8 references indexed in Scilit:
- A Large Scale Study of Data Center Network ReliabilityPublished by Association for Computing Machinery (ACM) ,2018
- Jupiter RisingACM SIGCOMM Computer Communication Review, 2015
- Demystifying the dark side of the middlePublished by Association for Computing Machinery (ACM) ,2013
- Surviving failures in bandwidth-constrained datacentersACM SIGCOMM Computer Communication Review, 2012
- Understanding network failures in data centersPublished by Association for Computing Machinery (ACM) ,2011
- Understanding network failures in data centersACM SIGCOMM Computer Communication Review, 2011
- Regression Models and Life‐TablesJournal of the Royal Statistical Society: Series B (Methodological), 1972
- Nonparametric Estimation from Incomplete ObservationsJournal of the American Statistical Association, 1958