Classification of imbalanced data sets using Multi Objective Genetic Programming
- 1 January 2015
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE) in 2015 International Conference on Computer Communication and Informatics (ICCCI)
Abstract
Classification of imbalanced data set is a challenging problem as it is very difficult to achieve good classification accuracy for each class in case of imbalanced data sets. This problem arises in many real world applications like medical diagnosis of rare medical disease, fraud detection in financial domain, and faulty area detection in network troubleshooting etc. The imbalanced data set consists of small number of instances of minority classes and large number of instances of majority classes. Overall classification accuracy is computed by taking the ratio of correctly classified instances to total number of instances in a data set. For imbalanced data sets, correct classification of minority class instances contribute minimum in improvement of overall classification accuracy as compared to classification of majority class instances. Conventional classification techniques like Artificial Neural Network (ANN), Support Vector Machine (SVM), and Naïve Bayes (NB) consider overall classification accuracy of the classifier only and thus evolve biased classifiers in case of imbalanced data set. However, instances of minority classes may contain rare but important information in many real world data sets. Thus, a classification technique that provides good classification accuracy on both minority and majority classes is needed. This paper proposes a combination of Multi Objective Genetic Programming (MOGP) and probability based Gaussian classifier for classification of imbalanced data set. MOGP considers classification accuracy of each class as separate objective and not the overall accuracy as single objective. Gaussian classifier is generative classifier in which distribution of one class never affect the classification of instances of other classes. The proposed methodology is applied on classification of imbalanced data sets from medical, life science, automobile, and space science domain. The results suggest that MOGP classifier outperformed other conventional classifiers (ANN, SVM, and NB) on tested imbalanced data sets.Keywords
This publication has 15 references indexed in Scilit:
- Genetic Programming for Classification with Unbalanced DataLecture Notes in Computer Science, 2010
- The WEKA data mining softwareACM SIGKDD Explorations Newsletter, 2009
- Multi-Objective Genetic Programming for Classification with Unbalanced DataLecture Notes in Computer Science, 2009
- Multi-objective optimization using genetic algorithms: A tutorialReliability Engineering & System Safety, 2006
- Using Gaussian distribution to construct fitness functions in genetic programming for multiclass object classificationPattern Recognition Letters, 2006
- Generative versus Discriminative Methods for Object RecognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Mining with rarityACM SIGKDD Explorations Newsletter, 2004
- EditorialACM SIGKDD Explorations Newsletter, 2004
- A fast and elitist multiobjective genetic algorithm: NSGA-IIIEEE Transactions on Evolutionary Computation, 2002
- A Comprehensive Survey of Evolutionary-Based Multiobjective Optimization TechniquesKnowledge and Information Systems, 1999