Detecting Electronic Banking Fraud on Highly Imbalanced Data using Hidden Markov Models

Abstract
Recent researches have revealed the capability of Machine Learning (ML) techniques to effectively detect fraud in electronic banking transactions since they have the potential to detect new and unknown intrusions. A major challenge in the application of ML to fraud detection is the presence of highly imbalanced data sets. In many available datasets, majority of transactions are genuine with an extremely small percentage of fraudulent ones. Designing an accurate and efficient fraud detection system that is low on false positives but detects fraudulent activity effectively is a significant challenge for researchers. In this paper, a framework based on Hidden Markov Models (HMM), modified Density Based Spatial Clustering of Applications with Noise (DBSCAN) and Synthetic Minority Oversampling Technique Techniques (SMOTE) is proposed to effectively detect fraud in a highly imbalanced electronic banking dataset. The various transaction types, transaction amounts and the frequency of transactions are taken into consideration by the proposed model to enable effective detection. With different number of hidden states for the proposed HMMs, simulations are performed for four (4) different approaches and their performances compared using precision, recall rate and F1-Score as the evaluation metrics. The study revealed that, our proposed approach is able to detect fraudulent transactions more effectively with reasonably low number of false positives.

This publication has 13 references indexed in Scilit: