Combining Distributed and Kernel Tracing for Performance Analysis of Cloud Applications
Open Access
- 26 October 2021
- journal article
- research article
- Published by MDPI AG in Electronics
- Vol. 10 (21), 2610
- https://doi.org/10.3390/electronics10212610
Abstract
Distributed tracing allows tracking user requests that span across multiple services and machines in a distributed application. However, typical cloud applications rely on abstraction layers that can hide the root cause of latency happening between processes or in the kernel. Because of its focus on high-level events, existing methodologies in applying distributed tracing can be limited when trying to detect complex contentions and relate them back to the originating requests. Cross-level analyses that include kernel-level events are necessary to debug problems as prevalent as mutex or disk contention, however cross-level analysis and associating events in the kernel and distributed tracing data is complex and can add a lot of overhead. This paper describes a new solution for combining distributed tracing with low-level software tracing in order to find the latency root cause better. We explain how we achieve a hybrid trace collection to capture and synchronize both kernel and distributed request events. Then, we present our design and implementation for a critical path analysis. We show that our analysis describes precisely how each request spends its time and what stands in its critical path while limiting overhead.Keywords
This publication has 26 references indexed in Scilit:
- Runtime latency detection and analysisSoftware: Practice and Experience, 2016
- Wait Analysis of Distributed Systems Using Kernel TracingIEEE Transactions on Parallel and Distributed Systems, 2015
- Reconciling high server utilization and sub-millisecond quality-of-servicePublished by Association for Computing Machinery (ACM) ,2014
- State History Tree: An Incremental Disk-Based Data Structure for Very Large Interval DataPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2013
- Lockless multi-core high-throughput buffering scheme for kernel tracingACM SIGOPS Operating Systems Review, 2012
- FayACM Transactions on Computer Systems, 2012
- Draco: Statistical diagnosis of chronic problems in large distributed systemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2012
- BorderPatrolACM SIGOPS Operating Systems Review, 2008
- WhodunitACM SIGOPS Operating Systems Review, 2007
- Request extraction in MagpiePublished by Association for Computing Machinery (ACM) ,2004