Anomaly Detection Using Program Control Flow Graph Mining From Execution Logs
- 13 August 2016
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM) in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
- p. 215-224
- https://doi.org/10.1145/2939672.2939712
Abstract
We focus on the problem of detecting anomalous run-time behavior of distributed applications from their execution logs. Specifically we mine templates and template sequences from logs to form a control flow graph (cfg) spanning distributed components. This cfg represents the baseline healthy system state and is used to flag deviations from the expected behavior of runtime logs. The novelty in our work stems from the new techniques employed to: (1) overcome the instrumentation requirements or application specific assumptions made in prior log mining approaches, (2) improve the accuracy of mined templates and the cfg in the presence of long parameters and high amount of interleaving respectively, and (3) improve by orders of magnitude the scalability of the cfg mining process in terms of volume of log data that can be processed per day. We evaluate our approach using (a) synthetic log traces and (b) multiple real-world log datasets collected at different layers of application stack. Results demonstrate that our template mining, cfg mining, and anomaly detection algorithms have high accuracy. The distributed implementation of our pipeline is highly scalable and has more than 500 GB/day of log data processing capability even on a 10 low-end VM based (Spark + Hadoop) cluster. We also demonstrate the efficacy of our end-to-end system using a case study with the Openstack VM provisioning system.Keywords
This publication has 19 references indexed in Scilit:
- Trace complexity of network inferencePublished by Association for Computing Machinery (ACM) ,2013
- Inferring Networks of Diffusion and InfluenceACM Transactions on Knowledge Discovery From Data, 2012
- Mining temporal invariants from partially ordered logsPublished by Association for Computing Machinery (ACM) ,2011
- Mining program workflow from interleaved tracesPublished by Association for Computing Machinery (ACM) ,2010
- Mining dependency in distributed systems through unstructured logs analysisACM SIGOPS Operating Systems Review, 2010
- Discovering Process Models from Unlabelled Event LogsLecture Notes in Computer Science, 2009
- An integrated framework on mining logs files for computing system managementPublished by Association for Computing Machinery (ACM) ,2005
- Workflow mining: discovering process models from event logsIEEE Transactions on Knowledge and Data Engineering, 2004
- A Breadth-First Algorithm for Mining Frequent Patterns from Event LogsLecture Notes in Computer Science, 2004
- Finding interesting associations without support pruningIEEE Transactions on Knowledge and Data Engineering, 2001