On the Improvement of the Isolation Forest Algorithm for Outlier Detection with Streaming Data
Open Access
- 24 June 2021
- journal article
- research article
- Published by MDPI AG in Electronics
- Vol. 10 (13), 1534
- https://doi.org/10.3390/electronics10131534
Abstract
In recent years, detecting anomalies in real-world computer networks has become a more and more challenging task due to the steady increase of high-volume, high-speed and high-dimensional streaming data, for which ground truth information is not available. Efficient detection schemes applied on networked embedded devices need to be fast and memory-constrained, and must be capable of dealing with concept drifts when they occur. Different approaches for unsupervised online outlier detection have been designed to deal with these circumstances in order to reliably detect malicious activity. In this paper, we introduce a novel framework called PCB-iForest, which generalized, is able to incorporate any ensemble-based online OD method to function on streaming data. Carefully engineered requirements are compared to the most popular state-of-the-art online methods with an in-depth focus on variants based on the widely accepted isolation forest algorithm, thereby highlighting the lack of a flexible and efficient solution which is satisfied by PCB-iForest. Therefore, we integrate two variants into PCB-iForest—an isolation forest improvement called extended isolation forest and a classic isolation forest variant equipped with the functionality to score features according to their contributions to a sample’s anomalousness. Extensive experiments were performed on 23 different multi-disciplinary and security-related real-world datasets in order to comprehensively evaluate the performance of our implementation compared with off-the-shelf methods. The discussion of results, including , score and averaged execution time metric, shows that PCB-iForest clearly outperformed the state-of-the-art competitors in 61% of cases and even achieved more promising results in terms of the tradeoff between classification and computational costs.
Keywords
Funding Information
- Bundesministerium für Bildung und Forschung (13FH645IB6)
- Ministerstvo Školství, Mládeže a Tělovýchovy (LO1506)
This publication has 51 references indexed in Scilit:
- UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set)Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- Theoretical Foundations and Algorithms for Outlier EnsemblesACM SIGKDD Explorations Newsletter, 2015
- Loda: Lightweight on-line detector of anomaliesMachine Learning, 2015
- Analysis of network traffic features for anomaly detectionMachine Learning, 2014
- An Anomaly Detection Approach Based on Isolation Forest Algorithm for Streaming Data using Sliding WindowIFAC Proceedings Volumes, 2013
- Change (Detection) You Can Believe in: Finding Distributional Shifts in Data StreamsLecture Notes in Computer Science, 2009
- Adaptive Learning from Evolving Data StreamsLecture Notes in Computer Science, 2009
- Learning from Time-Changing Data with Adaptive WindowingPublished by Society for Industrial & Applied Mathematics (SIAM) ,2007
- Learning with Drift DetectionLecture Notes in Computer Science, 2004
- Note on a Method for Calculating Corrected Sums of Squares and ProductsTechnometrics, 1962