Unbiased Recursive Partitioning: A Conditional Inference Framework
Top Cited Papers
- 1 September 2006
- journal article
- research article
- Published by Taylor & Francis Ltd in Journal of Computational and Graphical Statistics
- Vol. 15 (3), 651-674
- https://doi.org/10.1198/106186006x133933
Abstract
Recursive binary partitioning is a popular tool for regression analysis. Two fundamental problems of exhaustive search procedures usually applied to fit such models have been known for a long time: overfitting and a selection bias towards covariates with many possible splits or missing values. While pruning procedures are able to solve the overfitting problem, the variable selection bias still seriously affects the interpretability of tree-structured regression models. For some special cases unbiased procedures have been suggested, however lacking a common theoretical foundation. We propose a unified framework for recursive partitioning which embeds tree-structured regression models into a well defined theory of conditional inference procedures. Stopping criteria based on multiple test procedures are implemented and it is shown that the predictive performance of the resulting trees is as good as the performance of established exhaustive search procedures. It turns out that the partitions and therefore the models induced by both approaches are structurally different, confirming the need for an unbiased variable selection. Moreover, it is shown that the prediction accuracy of trees with early stopping is equivalent to the prediction accuracy of pruned trees with unbiased variable selection. The methodology presented here is applicable to all kinds of regression problems, including nominal, ordinal, numeric, censored as well as multivariate response variables and arbitrary measurement scales of the covariates. Data from studies on glaucoma classification, node positive breast cancer survival and mammography experience are re-analyzed.Keywords
This publication has 29 references indexed in Scilit:
- SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivationNature Genetics, 2008
- A Lego System for Conditional InferenceThe American Statistician, 2006
- The Design and Analysis of Benchmark ExperimentsJournal of Computational and Graphical Statistics, 2005
- Assessment of Optimal Selected Prognostic FactorsBiometrical Journal, 2004
- Classification Trees With Bivariate Linear Discriminant Node ModelsJournal of Computational and Graphical Statistics, 2003
- New Glaucoma Classification Method Based on Standard Heidelberg Retina Tomograph Parameters by Bagging Classification TreesJournal of Glaucoma, 2003
- Classification Trees With Unbiased Multiway SplitsJournal of the American Statistical Association, 2001
- 10.1162/153244303321897735Applied Physics Letters, 2000
- Survival Trees by Goodness of SplitJournal of the American Statistical Association, 1993
- Problems in the Analysis of Survey Data, and a ProposalJournal of the American Statistical Association, 1963