Multiple Imputation for Missing Data via Sequential Regression Trees

Top Cited Papers

Open Access

14 September 2010

journal article
research article
Published by Oxford University Press (OUP) in American Journal of Epidemiology

Vol. 172 (9), 1070-1076
https://doi.org/10.1093/aje/kwq260

Abstract

Multiple imputation is particularly well suited to deal with missing data in large epidemiologic studies, because typically these studies support a wide range of analyses by many data users. Some of these analyses may involve complex modeling, including interactions and nonlinear relations. Identifying such relations and encoding them in imputation models, for example, in the conditional regressions for multiple imputation via chained equations, can be daunting tasks with large numbers of categorical and continuous variables. The authors present a nonparametric approach for implementing multiple imputation via chained equations by using sequential regression trees as the conditional models. This has the potential to capture complex relations with minimal tuning by the data imputer. Using simulations, the authors demonstrate that the method can result in more plausible imputations, and hence more reliable inferences, in complex settings than the naive application of standard sequential regression imputation techniques. They apply the approach to impute missing values in data on adverse birth outcomes with more than 100 clinical and survey variables. They evaluate the imputations using posterior predictive checks with several epidemiologic analyses of interest.

Keywords

This publication has 16 references indexed in Scilit:

Multiple Imputation With Large Data Sets: A Case Study of the Children's Mental Health Initiative
American Journal of Epidemiology, 2009
Use of Multiple Imputation in the Epidemiologic Literature
American Journal of Epidemiology, 2008
The Multiple Adaptations of Multiple Imputation
Journal of the American Statistical Association, 2007
Multiple imputation: review of theory, implementation and software
Statistics in Medicine, 2007
Applications of multiple imputation in medical studies: from AIDS to NHANES
Statistical Methods in Medical Research, 1999
Multiple imputation: a primer
Statistical Methods in Medical Research, 1999
Multiple Imputation after 18+ Years
Journal of the American Statistical Association, 1996
Inference from Iterative Simulation Using Multiple Sequences
Statistical Science, 1992
Multivariate Adaptive Regression Splines
The Annals of Statistics, 1991
The Bayesian Bootstrap
The Annals of Statistics, 1981

Cited by 194 articles