ENHANCEMENT OF DECISION TREE METHOD BASED ON HIERARCHICAL CLUSTERING AND DISPERSION RATIO

Abstract
The classification process using a decision tree is a classification method that has a feature selection process. Decision tree classifications using information gain have a disadvantage when the dataset has unique attributes for each imbalanced class record and distribution. The data used for decision tree classification has 2 types, numerical and nominal. The numerical data type is carried out a discretization process so that it gets data intervals. Weaknesses in the information gain method can be reduced by using a dispersion ratio method that does not depend on the class distribution, but on the frequency distribution. Numeric type data will be dis-criticized using the hierarchical clustering method to obtain a balanced data cluster. The data used in this study were taken from the UCI machine learning repository, which has two types of numeric and nominal data. There are two stages in this research namely, first the numeric type data will be discretized using hierarchical clustering with 3 methods, namely single link, complete link, and average link. Second, the results of discretization will be merged again then the formation of trees with splitting attributes using dispersion ratio and evaluated with cross-validation k-fold 7. The results obtained show that the discretization of data with hierarchical clustering can increase predictions by 14.6% compared with data without discretization. The attribute splitting process with the dispersion ratio of the data resulting from the discretization of hierarchical clustering can increase the prediction by 6.51%.