SVMs Modeling for Highly Imbalanced Classification
Top Cited Papers
- 9 December 2008
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)
- Vol. 39 (1), 281-288
- https://doi.org/10.1109/tsmcb.2008.2002909
Abstract
Traditional classification algorithms can be limited in their performance on highly unbalanced data sets. A popular stream of work for countering the problem of class imbalance has been the application of a sundry of sampling strategies. In this paper, we focus on designing modifications to support vector machines (SVMs) to appropriately tackle the problem of class imbalance. We incorporate different ldquorebalancerdquo heuristics in SVM modeling, including cost-sensitive learning, and over- and undersampling. These SVM-based strategies are compared with various state-of-the-art approaches on a variety of data sets by using various metrics, including G-mean, area under the receiver operating characteristic curve, F-measure, and area under the precision/recall curve. We show that we are able to surpass or match the previously known best algorithms on each data set. In particular, of the four SVM variations considered in this paper, the novel granular SVMs-repetitive undersampling algorithm (GSVM-RU) is the best in terms of both effectiveness and efficiency. GSVM-RU is effective, as it can minimize the negative effect of information loss while maximizing the positive effect of data cleaning in the undersampling process. GSVM-RU is efficient by extracting much less support vectors and, hence, greatly speeding up SVM prediction.Keywords
This publication has 18 references indexed in Scilit:
- Granular SVM with Repetitive Undersampling for Highly Imbalanced Protein Homology PredictionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2006
- Maximizing sensitivity in medical diagnosis using biased minimax probability MachineIEEE Transactions on Biomedical Engineering, 2006
- The relationship between Precision-Recall and ROC curvesPublished by Association for Computing Machinery (ACM) ,2006
- Granular support vector machines with association rules mining for protein homology predictionArtificial Intelligence in Medicine, 2005
- Extreme re-balancing for SVMsACM SIGKDD Explorations Newsletter, 2004
- Learning from imbalanced data sets with boosting and data generationACM SIGKDD Explorations Newsletter, 2004
- Mining with rarityACM SIGKDD Explorations Newsletter, 2004
- EditorialACM SIGKDD Explorations Newsletter, 2004
- Efficient support vector classifiers for named entity recognitionPublished by Association for Computational Linguistics (ACL) ,2002
- The use of the area under the ROC curve in the evaluation of machine learning algorithmsPattern Recognition, 1997