Malicious URLs detection using data streaming algorithms

Abstract
As a result of advancements in technology and technological devices, data is now spawned at an infinite rate, emanating from a vast array of networks, devices, and daily operations like credit card transactions and mobile phones. Datastream entails sequential and real-time continuous data in the inform of evolving stream. However, the traditional machine learning approach is characterized by a batch learning model. Labeled training data are given apriori to train a model based on some machine learning algorithms. This technique necessitates the entire training sample to be readily accessible before the learning process. The training procedure is mainly done offline in this setting due to the high training cost. Consequently, the traditional batch learning technique suffers severe drawbacks, such as poor scalability for real-time phishing websites detection. The model mostly requires re-training from scratch using new training samples. This paper presents the application of streaming algorithms for detecting malicious URLs based on selected online learners: Hoeffding Tree (HT), Naïve Bayes (NB), and Ozabag. Ozabag produced promising results in terms of accuracy, Kappa and Kappa Temp on the dataset with large samples while HT and NB have the least prediction time with comparable accuracy and Kappa with Ozabag algorithm for the real-time detection of phishing websites.
Funding Information
  • University of Ilorin

This publication has 14 references indexed in Scilit: