How should variable selection be performed with multiply imputed data?
- 17 January 2008
- journal article
- research article
- Published by Wiley in Statistics in Medicine
- Vol. 27 (17), 3227-3246
- https://doi.org/10.1002/sim.3177
Abstract
Multiple imputation is a popular technique for analysing incomplete data. Given the imputed data and a particular model, Rubin's rules (RR) for estimating parameters and standard errors are well established. However, there are currently no guidelines for variable selection in multiply imputed data sets. The usual practice is to perform variable selection amongst the complete cases, a simple but inefficient and potentially biased procedure. Alternatively, variable selection can be performed by repeated use of RR, which is more computationally demanding. An approximation can be obtained by a simple ‘stacked’ method that combines the multiply imputed data sets into one and uses a weighting scheme to account for the fraction of missing data in each covariate. We compare these and other approaches using simulations based around a trial in community psychiatry. Most methods improve on the naïve complete-case analysis for variable selection, but importantly the type 1 error is only preserved if selection is based on RR, which is our recommended approach. Copyright © 2008 John Wiley & Sons, Ltd.Keywords
This publication has 23 references indexed in Scilit:
- Evaluation of software for multiple imputation of semi-continuous dataStatistical Methods in Medical Research, 2007
- Sensitivity analysis after multiple imputation under missing at random: a weighting approachStatistical Methods in Medical Research, 2007
- Imputation and Variable Selection in Linear Regression Models with Missing CovariatesBiometrics, 2005
- Multiple Imputation for Model Checking: Completed‐Data Plots with Missing and Latent DataBiometrics, 2005
- Stability of multivariable fractional polynomial models with selection of variables and transformations: a bootstrap investigationStatistics in Medicine, 2003
- The Use of Resampling Methods to Simplify Regression Models in Medical StatisticsJournal of the Royal Statistical Society Series C: Applied Statistics, 1999
- Not Asked and Not Answered: Multiple Imputation for Multiple SurveysJournal of the American Statistical Association, 1998
- Regression With Missing X's: A ReviewJournal of the American Statistical Association, 1992
- Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variablesBritish Journal of Mathematical and Statistical Psychology, 1992
- Performing likelihood ratio tests with multiply-imputed data setsBiometrika, 1992