Improved TrAdaBoost and its Application to Transaction Fraud Detection

27 August 2020

journal article
research article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Computational Social Systems

Vol. 7 (5), 1304-1316
https://doi.org/10.1109/tcss.2020.3017013

Abstract

AdaBoost is a boosting-based machine learning method under the assumption that the data in training and testing sets have the same distribution and input feature space. It increases the weights of those instances that are wrongly classified in a training process. However, the assumption does not hold in many real-world data sets. Therefore, AdaBoost is extended to transfer AdaBoost (TrAdaBoost) that can effectively transfer knowledge from one domain to another. TrAdaBoost decreases the weights of those instances that belong to the source domain but are wrongly classified in a training process. It is more suitable for the case that data are of different distribution. Can it be improved for some special transfer scenarios, e.g., the data distribution changes slightly over time? We find that the distribution of credit card transaction data can change with the changes in the transaction behaviors of users, but the changes are slow most of the time. These changes are yet important for detecting transaction fraud since they result in a so-called concept drift problem. In order to make TrAdaBoost more suitable for the abovementioned case, we, thus, propose an improved TrAdaBoost (ITrAdaBoost) in this article. It updates (i.e., increases or decreases) the weight of a wrongly classified instance in a source domain according to the distribution distance from the instance to a target domain, and the calculation of distance is based on the theory of reproducing kernel Hilbert space. We do a series of experiments over five data sets, and the results illustrate the advantage of ITrAdaBoost.

Keywords

Funding Information

National Key Research and Development Program of China (2018YFB2100801)
Fundamental Research Funds for the Central Universities of China (22120190198)

This publication has 59 references indexed in Scilit:

Generalized Transfer Subspace Learning Through Low-Rank Constraint
International Journal of Computer Vision, 2014
Double-bootstrapping source data selection for instance-based transfer learning
Pattern Recognition Letters, 2013
Sparse transfer learning for interactive video search reranking
ACM Transactions on Multimedia Computing, Communications, and Applications, 2012
Adaptive Boosting for Transfer Learning Using Dynamic Updates
Lecture Notes in Computer Science, 2011
Exploiting associations between word clusters and document classes for cross‐domain text categorization†
Statistical Analysis and Data Mining, 2010
Reproducing Kernel Hilbert Spaces
Published by Wiley ,2004
Improving predictive inference under covariate shift by weighting the log-likelihood function
Journal of Statistical Planning and Inference, 2000
Support-vector networks
Machine Learning, 1995
On Information and Sufficiency
The Annals of Mathematical Statistics, 1951
Theory of reproducing kernels
Transactions of the American Mathematical Society, 1950

Cited by 58 articles