Tree-Structured Classification Via Generalized Discriminant Analysis

Abstract
The problem of constructing classification rules that can be represented as decision trees is considered. Each object to be classified has an associated x vector containing possibly incomplete covariate information. Tree construction is based on the information provided in a “learning sample” of objects with known class identities. The x vectors in the learning sample may have missing values as well. Procedures are proposed for each of the components of classifier construction, such as split selection, tree-size determination, treatment of missing values, and ranking of variables. The main idea is recursive application of linear discriminant analysis, with the variables at each stage being appropriately chosen according to the data and the type of splits desired. Standard statistical techniques used as basic building blocks include analysis of variance, linear and canonical discriminant analysis, and principal component analysis. A new method of tree-structured classification is obtained by assembling the pieces. This method can accommodate prior probabilities as well as unequal misclassification costs and can yield trees with univariate, linear combination, or linear combination with polar coordinate splits. The method is compared with the CART method of Breiman, Friedman, Olshen, and Stone (1984). Some of the operational differences are that the new method (a) can have multiple splits per node, (b) is nonrandomized, (c) uses a direct stopping rule, (d) handles missing values by estimation, (e) allows both ordered and unordered variables in the same linear combination split, (f) is not invariant of monotone transformations of the individual variables, and (g) is computationally faster. Simulation experiments suggest that the two methods have comparable classification accuracy. The Boston housing data are analyzed in a classification context for illustration.