Dealing with missing information on covariates for excess mortality hazard regression models – Making the imputation model compatible with the substantive model

Abstract
Missing data is a common issue in epidemiological databases. Among the different ways of dealing with missing data, multiple imputation has become more available in common statistical software packages. However, the incompatibility between the imputation and substantive model, which can arise when the associations between variables in the substantive model are not taken into account in the imputation models or when the substantive model is itself nonlinear, can lead to invalid inference. Aiming at analysing population-based cancer survival data, we extended the multiple imputation substantive model compatible-fully conditional specification (SMC-FCS) approach, proposed by Bartlett et al. in 2015 to accommodate excess hazard regression models. The proposed approach was compared with the standard fully conditional specification multiple imputation procedure and with the complete-case analysis using a simulation study. The SMC-FCS approach produced unbiased estimates in both scenarios tested, while the fully conditional specification produced biased estimates and poor empirical coverages probabilities. The SMC-FCS algorithm was then used for handling missing data in the evaluation of socioeconomic inequalities in survival from colorectal cancer patients diagnosed in the North Region of Portugal. The analysis using SMC-FCS showed a clearer trend in higher excess hazards for patients coming from more deprived areas. The proposed algorithm was implemented in R software and is presented as Supplementary Material.