Variable selection and importance in presence of high collinearity: an application to the prediction of lean body mass from multi-frequency bioelectrical impedance
- 13 May 2020
- journal article
- research article
- Published by Taylor & Francis Ltd in Journal of Applied Statistics
- Vol. 48 (9), 1644-1658
- https://doi.org/10.1080/02664763.2020.1763930
Abstract
In prediction problems both response and covariates may have high correlation with a second group of influential regressors, that can be considered as background variables. An important challenge is to perform variable selection and importance assessment among the covariates in the presence of these variables. A clinical example is the prediction of the lean body mass (response) from bioimpedance (covariates), where anthropometric measures play the role of background variables. We introduce a reduced dataset in which the variables are defined as the residuals with respect to the background, and perform variable selection and importance assessment both in linear and random forest models. Using a clinical dataset of multi-frequency bioimpedance, we show the effectiveness of this method to select the most relevant predictors of the lean body mass beyond anthropometry.Keywords
This publication has 35 references indexed in Scilit:
- A new variable selection approach using Random ForestsComputational Statistics & Data Analysis, 2013
- Mining data with random forests: A survey and results of new testsPattern Recognition, 2011
- Variable selection using random forestsPattern Recognition Letters, 2010
- The behaviour of random forest permutation-based variable importance measures under predictor correlationBMC Bioinformatics, 2010
- An introduction to recursive partitioning: Rationale, application, and characteristics of classification and regression trees, bagging, and random forests.Psychological Methods, 2009
- Variable Importance Assessment in Regression: Linear Regression versus Random ForestThe American Statistician, 2009
- Conditional variable importance for random forestsBMC Bioinformatics, 2008
- Estimators of Relative Importance in Linear Regression Based on Variance DecompositionThe American Statistician, 2007
- Unbiased Recursive Partitioning: A Conditional Inference FrameworkJournal of Computational and Graphical Statistics, 2006
- Reachability Checking of Finite Precision Timed AutomataJournal of Software, 2006