Validation of prediction models based on lasso regression with multiply imputed data

Open Access

16 October 2014

journal article
research article
Published by Springer Science and Business Media LLC in BMC Medical Research Methodology

Vol. 14 (1), 116
https://doi.org/10.1186/1471-2288-14-116

Abstract

In prognostic studies, the lasso technique is attractive since it improves the quality of predictions by shrinking regression coefficients, compared to predictions based on a model fitted via unpenalized maximum likelihood. Since some coefficients are set to zero, parsimony is achieved as well. It is unclear whether the performance of a model fitted using the lasso still shows some optimism. Bootstrap methods have been advocated to quantify optimism and generalize model performance to new subjects. It is unclear how resampling should be performed in the presence of multiply imputed data. The data were based on a cohort of Chronic Obstructive Pulmonary Disease patients. We constructed models to predict Chronic Respiratory Questionnaire dyspnea 6 months ahead. Optimism of the lasso model was investigated by comparing 4 approaches of handling multiply imputed data in the bootstrap procedure, using the study data and simulated data sets. In the first 3 approaches, data sets that had been completed via multiple imputation (MI) were resampled, while the fourth approach resampled the incomplete data set and then performed MI. The discriminative model performance of the lasso was optimistic. There was suboptimal calibration due to over-shrinkage. The estimate of optimism was sensitive to the choice of handling imputed data in the bootstrap resampling procedure. Resampling the completed data sets underestimates optimism, especially if, within a bootstrap step, selected individuals differ over the imputed data sets. Incorporating the MI procedure in the validation yields estimates of optimism that are closer to the true value, albeit slightly too larger. Performance of prognostic models constructed using the lasso technique can be optimistic as well. Results of the internal validation are sensitive to how bootstrap resampling is performed.

Keywords

This publication has 28 references indexed in Scilit:

Characteristics of Dutch and Swiss primary care COPD patients - baseline data of the ICE COLD ERIC study
Clinical Epidemiology, 2011
Multiple imputation using chained equations: Issues and guidance for practice
Statistics in Medicine, 2010
The search for stable prognostic models in multiple imputed data sets
BMC Medical Research Methodology, 2010
Insufficient quality of sputum submitted for tuberculosis diagnosis and associated factors, in Klaten district, Indonesia
BMC Pulmonary Medicine, 2009
ICE COLD ERIC – International collaborative effort on chronic obstructive lung disease: exacerbation risk index cohorts – Study protocol for an international COPD cohort study
BMC Pulmonary Medicine, 2009
Variable selection under multiple imputation using the bootstrap in a prognostic study
BMC Medical Research Methodology, 2007
Imputation and Variable Selection in Linear Regression Models with Missing Covariates
Biometrics, 2005
Internal validation of predictive models: Efficiency of some procedures for logistic regression analysis
Journal of Clinical Epidemiology, 2001
The Little Bootstrap and other Methods for Dimensionality Selection in Regression: X-Fixed Prediction Error
Journal of the American Statistical Association, 1992
Two further applications of a model for binary regression
Biometrika, 1958

Cited by 92 articles