A KNN Undersampling Approach for Data Balancing
Open Access
- 1 January 2015
- journal article
- research article
- Published by Scientific Research Publishing, Inc. in Journal of Intelligent Learning Systems and Applications
- Vol. 07 (04), 104-116
- https://doi.org/10.4236/jilsa.2015.74010
Abstract
In supervised learning, the imbalanced number of instances among the classes in a dataset can make the algorithms to classify one instance from the minority class as one from the majority class. With the aim to solve this problem, the KNN algorithm provides a basis to other balancing methods. These balancing methods are revisited in this work, and a new and simple approach of KNN undersampling is proposed. The experiments demonstrated that the KNN undersampling method outperformed other sampling methods. The proposed method also outperformed the results of other studies, and indicates that the simplicity of KNN can be used as a base for efficient algorithms in machine learning and knowledge discovery.Keywords
This publication has 13 references indexed in Scilit:
- The World’s Technological Capacity to Store, Communicate, and Compute InformationScience, 2011
- Genetic algorithms as a pre processing strategy for imbalanced datasetsPublished by Association for Computing Machinery (ACM) ,2011
- Evolutionary Undersampling for Classification with Imbalanced Datasets: Proposals and TaxonomyEvolutionary Computation, 2009
- Evolutionary rule-based systems for imbalanced data setsSoft Computing, 2008
- Top 10 algorithms in data miningKnowledge and Information Systems, 2007
- Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets LearningLecture Notes in Computer Science, 2005
- A study of the behavior of several methods for balancing machine learning training dataACM SIGKDD Explorations Newsletter, 2004
- Strategies for learning in class imbalance problemsPattern Recognition, 2003
- SMOTEBoost: Improving Prediction of the Minority Class in BoostingLecture Notes in Computer Science, 2003
- Asymptotic Properties of Nearest Neighbor Rules Using Edited DataIEEE Transactions on Systems, Man, and Cybernetics, 1972