SherLog
- 5 March 2010
- journal article
- conference paper
- Published by Association for Computing Machinery (ACM) in ACM SIGARCH Computer Architecture News
- Vol. 38 (1), 143-154
- https://doi.org/10.1145/1735970.1736038
Abstract
Computer systems often fail due to many factors such as software bugs or administrator errors. Diagnosing such production run failures is an important but challenging task since it is difficult to reproduce them in house due to various reasons: (1) unavailability of users' inputs and file content due to privacy concerns; (2) difficulty in building the exact same execution environment; and (3) non-determinism of concurrent executions on multi-processors. Therefore, programmers often have to diagnose a production run failure based on logs collected back from customers and the corresponding source code. Such diagnosis requires expert knowledge and is also too time-consuming, tedious to narrow down root causes. To address this problem, we propose a tool, called SherLog, that analyzes source code by leveraging information provided by run-time logs to infer what must or may have happened during the failed production run. It requires neither re-execution of the program nor knowledge on the log's semantics. It infers both control and data value information regarding to the failed execution. We evaluate SherLog with 8 representative real world software failures (6 software bugs and 2 configuration errors) from 7 applications including 3 servers. Information inferred by SherLog are very useful for programmers to diagnose these evaluated failures. Our results also show that SherLog can analyze large server applications such as Apache with thousands of logging messages within only 40 minutes.Keywords
This publication has 27 references indexed in Scilit:
- Parametric Trace Slicing and MonitoringLecture Notes in Computer Science, 2009
- Finding programming errors earlier by evaluating runtime monitors ahead-of-timePublished by Association for Computing Machinery (ACM) ,2008
- Sound, complete and scalable path-sensitive analysisACM SIGPLAN Notices, 2008
- SaturnACM Transactions on Programming Languages and Systems, 2007
- Capturing, indexing, clustering, and retrieving system historyPublished by Association for Computing Machinery (ACM) ,2005
- PSEACM SIGSOFT Software Engineering Notes, 2004
- From symptom to causeACM SIGPLAN Notices, 2003
- Isolating cause-effect chains from computer programsPublished by Association for Computing Machinery (ACM) ,2002
- Hybrid slicingACM Transactions on Software Engineering and Methodology, 1997
- Debugging with dynamic slicing and backtrackingSoftware: Practice and Experience, 1993