Anomaly Detection and Diagnosis Algorithms for Discrete Symbol Sequences with Applications to Airline Safety
- 2 December 2008
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews)
- Vol. 39 (1), 101-113
- https://doi.org/10.1109/tsmcc.2008.2007248
Abstract
We present a set of novel algorithms which we call sequenceMiner that detect and characterize anomalies in large sets of high-dimensional symbol sequences that arise from recordings of switch sensors in the cockpits of commercial airliners. While the algorithms that we present are general and domain-independent, we focus on a specific problem that is critical to determining the system-wide health of a fleet of aircraft. The approach taken uses unsupervised clustering of sequences using the normalized length of the longest common subsequence as a similarity measure, followed by detailed outlier analysis to detect anomalies. In this method, an outlier sequence is defined as a sequence that is far away from the cluster center. We present new algorithms for outlier analysis that provide comprehensible indicators as to why a particular sequence is deemed to be an outlier. The algorithms provide a coherent description to an analyst of the anomalies in the sequence when compared to more normal sequences. In the final section of the paper, we demonstrate the effectiveness of sequenceMiner for anomaly detection on a real set of discrete-sequence data from a fleet of commercial airliners. We show that sequenceMiner discovers actionable and operationally significant safety events. We also compare our innovations with standard hidden Markov models, and show that our methods are superior.Keywords
This publication has 21 references indexed in Scilit:
- Integrated system health management (ISHM): systematic capability implementationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2006
- Recursive data mining for masquerade detection and author identificationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Detecting intrusions using system calls: alternative data modelsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- A scalable algorithm for clustering sequential dataPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- A survey of longest common subsequence algorithmsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Topology of strings: Median string is NP-completeTheoretical Computer Science, 2000
- A tutorial on hidden Markov models and selected applications in speech recognitionProceedings of the IEEE, 1989
- Some Properties of Continuous Hidden Markov Model RepresentationsAT&T Technical Journal, 1985
- A fast algorithm for computing longest common subsequencesCommunications of the ACM, 1977
- A linear space algorithm for computing maximal common subsequencesCommunications of the ACM, 1975