Confusion-Matrix-Based Kernel Logistic Regression for Imbalanced Data Classification
Open Access
- 15 March 2017
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Knowledge and Data Engineering
- Vol. 29 (9), 1806-1819
- https://doi.org/10.1109/tkde.2017.2682249
Abstract
There have been many attempts to classify imbalanced data, since this classification is critical in a wide variety of applications related to the detection of anomalies, failures, and risks. Many conventional methods, which can be categorized into sampling, cost-sensitive, or ensemble, include heuristic and task dependent processes. In order to achieve a better classification performance by formulation without heuristics and task dependence, we propose confusion-matrix-based kernel logistic regression (CM-KLOGR). Its objective function is the harmonic mean of various evaluation criteria derived from a confusion matrix, such criteria as sensitivity, positive predictive value, and others for negatives. This objective function and its optimization are consistently formulated on the framework of KLOGR, based on minimum classification error and generalized probabilistic descent (MCE/GPD) learning. Due to the merits of the harmonic mean, KLOGR, and MCE/GPD, CM-KLOGR improves the multifaceted performances in a well-balanced way. This paper presents the formulation of CM-KLOGR and its effectiveness through experiments that comparatively evaluated CM-KLOGR using benchmark imbalanced datasets.Keywords
Funding Information
- JSPS (15K00323)
- MEXT
- Strategic Research Foundation at Private Universities (2014-2018)
This publication has 41 references indexed in Scilit:
- An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristicsInformation Sciences, 2013
- Machine learning classification with confidence: Application of transductive conformal predictors to MRI-based diagnostic and prognostic markers in depressionNeuroImage, 2011
- Kernel Logistic Regression and the Import Vector MachineJournal of Computational and Graphical Statistics, 2005
- Pattern recognition using a family of design algorithms based upon the generalized probabilistic descent methodProceedings of the IEEE, 1998
- Minimum classification error rate methods for speech recognitionIEEE Transactions on Speech and Audio Processing, 1997
- Support-vector networksMachine Learning, 1995
- Statistics Notes: Diagnostic tests 3: receiver operating characteristic plotsBMJ, 1994
- Statistics Notes: Diagnostic tests 2: predictive valuesBMJ, 1994
- Statistics Notes: Diagnostic tests 1: sensitivity and specificityBMJ, 1994
- Stacked generalizationNeural Networks, 1992