An analysis of machine learning techniques (J48 & AdaBoost)-for classification

Abstract
Extraction of relevant Information from data Is a challenging task. Many times an analyst may end up with an erroneous classifier because of huge, redundant, unreliable and noisy data. It may also be due to misinterpretation of results and usage of inappropriate techniques for a specific situation. In our study, we have investigated the two main approaches in data mining which are Decision Tree (J48 algorithm) and Ensemble Learning Technique (AdaBoost). We have also performed a comparative analysis of these two techniques. The tool which we have used is WEKA which is an open source's software. The datasets which we have used are supermarket.arff, labor.arff, soybean.arff, & segment.arff. Here. arff is an abbreviation of Attribute Relation file Format which is the standard format that is accepted by WEKA for the datasets. The sample size of training set is number of attributes and number of instances available in the dataset. Classification models are assessed on the basis of number of class labels in the dataset, accuracy, amount and length of the generated rules, error rate and standard deviation. Based on number of experiments which we have performed, results show that AdaBoost provides better accuracy than Decision Tree (J48 algorithm) when the number of class labels in the dataset are exactly two whereas the Decision Tree generate rules faster than the Adaboost and when number of class labels in the dataset are more than two then J48 algorithm performs better than Adaboost. The results show the tradeoff of using the two approaches for other researchers in finding the best model for a particular problem.

This publication has 2 references indexed in Scilit: