DECISION TREE CONSTRUCTION FOR THE CASE OF LOW-INFORMATIVE FEATURES

Open Access

16 April 2019

journal article
research article
Published by National University "Zaporizhzhia Polytechnic" in Radio Electronics, Computer Science, Control

No. 1,p. 122-131
https://doi.org/10.15588/1607-3274-2019-1-12

Abstract

Context. The problem of automating the decision tree construction is addressed. The object of study is a decision tree. The subject of study is the methods of decision tree building. Objective. The purpose of the work is to create a method for constructing models based on decision trees for data samples that are characterized by sets of individually low-informative features. Method. A method for decision tree constructing is proposed, which for a given sample determines the individual informativeness of features relatively to the output feature, and also evaluates the relationship of input features with each other as their individual informativity pairwise relatively to each other, at the step of forming the next node the method selects as a candidate feature the feature that gives the best partition in the whole set of features, after which it sequentially searches among all the features that are not selected for this node the one that is individually most closely related with the selected candidate, then for the set of selected features, iterating through the available transformations from a given set, determines the quality of the partition for each transformation, selects the best transformation and adds it to the node. When forming the next node, the method tends to single out a group of the most closely interrelated features, the conversion of which into a scalar value will provide the best partitioning of a subsample of instances hit into this node. This makes possible to reduce the size of the model and the branching of the tree, speed up the calculations in recognizing instances based on the model, as well as improve the generalizing properties of the model and its interpretability. The proposed method allows using the constructed decision tree to assess the feature significance.Results. The developed method is implemented as software and investigated at signal represented by a set of individually lowinformative readings classification problem solving. Conclusions. The experiments have confirmed the efficiency of the proposed software and allow recommending it for use in practice in solving problems of diagnostics and automatic classification by features. The prospects for further research may consist in the creation of parallel methods for constructing decision trees based on the proposed method, optimization of its software implementations, and also in an experimental study of the proposed method on a wider set of practical problems

Keywords

This publication has 8 references indexed in Scilit:

Cross-Validated Variable Selection in Tree-Based Methods Improves Predictive Performance
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016
The Dimensionality Reduction Methods Based on Computational Intelligence in Problems of Object Classification and Diagnosis
Published by Springer Science and Business Media LLC ,2016
Classification with correlated features: unreliability of feature ranking and solutions
Bioinformatics, 2011
Permutation importance: a corrected feature importance measure
Bioinformatics, 2010
Supervised learning with decision tree-based methods in computational and systems biology
Molecular BioSystems, 2009
SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation
Nature Genetics, 2008
Unbiased split selection for classification trees based on the Gini Index
Computational Statistics & Data Analysis, 2006
Unbiased Recursive Partitioning: A Conditional Inference Framework
Journal of Computational and Graphical Statistics, 2006

Cited by 11 articles