CloudSeer
- 25 March 2016
- journal article
- conference paper
- Published by Association for Computing Machinery (ACM) in ACM SIGARCH Computer Architecture News
- Vol. 44 (2), 489-502
- https://doi.org/10.1145/2980024.2872407
Abstract
Cloud infrastructures provide a rich set of management tasks that operate computing, storage, and networking resources in the cloud. Monitoring the executions of these tasks is crucial for cloud providers to promptly find and understand problems that compromise cloud availability. However, such monitoring is challenging because there are multiple distributed service components involved in the executions. CloudSeer enables effective workflow monitoring. It takes a lightweight non-intrusive approach that purely works on interleaved logs widely existing in cloud infrastructures. CloudSeer first builds an automaton for the workflow of each management task based on normal executions, and then it checks log messages against a set of automata for workflow divergences in a streaming manner. Divergences found during the checking process indicate potential execution problems, which may or may not be accompanied by error log messages. For each potential problem, CloudSeer outputs necessary context information including the affected task automaton and related log messages hinting where the problem occurs to help further diagnosis. Our experiments on OpenStack, a popular open-source cloud infrastructure, show that CloudSeer's efficiency and problem-detection capability are suitable for online monitoring.Keywords
This publication has 12 references indexed in Scilit:
- Inferring models of concurrent systems from logs of their behavior with CSightPublished by Association for Computing Machinery (ACM) ,2014
- LimplockPublished by Association for Computing Machinery (ACM) ,2013
- On fault resilience of OpenStackPublished by Association for Computing Machinery (ACM) ,2013
- Mining temporal invariants from partially ordered logsACM SIGOPS Operating Systems Review, 2012
- PREFAILPublished by Association for Computing Machinery (ACM) ,2011
- ELT: Efficient Log-based Troubleshooting System for Cloud Computing InfrastructuresPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2011
- Mining program workflow from interleaved tracesPublished by Association for Computing Machinery (ACM) ,2010
- Detecting large-scale system problems by mining console logsPublished by Association for Computing Machinery (ACM) ,2009
- Automatic steering of behavioral model inferencePublished by Association for Computing Machinery (ACM) ,2009
- Inferring Finite-State Models with Temporal ConstraintsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2008