Malicious URLs detection using data streaming algorithms

Open Access

Abstract

As a result of advancements in technology and technological devices, data is now spawned at an infinite rate, emanating from a vast array of networks, devices, and daily operations like credit card transactions and mobile phones. Datastream entails sequential and real-time continuous data in the inform of evolving stream. However, the traditional machine learning approach is characterized by a batch learning model. Labeled training data are given apriori to train a model based on some machine learning algorithms. This technique necessitates the entire training sample to be readily accessible before the learning process. The training procedure is mainly done offline in this setting due to the high training cost. Consequently, the traditional batch learning technique suffers severe drawbacks, such as poor scalability for real-time phishing websites detection. The model mostly requires re-training from scratch using new training samples. This paper presents the application of streaming algorithms for detecting malicious URLs based on selected online learners: Hoeffding Tree (HT), Naïve Bayes (NB), and Ozabag. Ozabag produced promising results in terms of accuracy, Kappa and Kappa Temp on the dataset with large samples while HT and NB have the least prediction time with comparable accuracy and Kappa with Ozabag algorithm for the real-time detection of phishing websites.

Keywords

Funding Information

University of Ilorin

This publication has 14 references indexed in Scilit:

Extremely Fast Decision Tree
Published by Association for Computing Machinery (ACM) ,2018
Artificial Neural Network for Websites Classification with Phishing Characteristics
Social Networking, 2018
Using Case-Based Reasoning for Phishing Detection
Procedia Computer Science, 2017
Two-stage ELM for phishing Web pages detection using hybrid features
World Wide Web, 2016
Method for Detecting a Malicious Domain by using only Well-known Information
International Journal of Cyber-Security and Digital Forensics, 2016
You Look Suspicious!!: Leveraging Visible Attributes to Classify Malicious Short URLs on Twitter
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2016
Towards understanding upstream Web traffic
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
Predicting phishing websites based on self-structuring neural network
Neural Computing & Applications, 2013
Identifying suspicious URLs
Published by Association for Computing Machinery (ACM) ,2009
Mining high-speed data streams
Published by Association for Computing Machinery (ACM) ,2000

Cited by 1 article