Prognostic Modeling with Logistic Regression Analysis

Abstract
Clinical decision making often requires estimates of the likelihood of a dichotomous outcome in individual patients. When empirical data are available, these estimates may well be obtained from a logistic regression model. Several strategies may be followed in the development of such a model. In this study, the authors compare alternative strategies in 23 small subsamples from a large data set of patients with an acute myocardial infarction, where they developed predictive models for 30-day mortality. Evaluations were performed in an independent part of the data set. Specifically, the authors studied the effect of coding of covariables and stepwise selection on discriminative ability of the resulting model, and the effect of statistical “shrinkage” techniques on calibration. As expected, dichotomization of continuous covariables implied a loss of information. Remarkably, stepwise selection resulted in less discriminating models compared to full models including all available covariables, even when more than half of these were randomly associated with the outcome. Using qualitative information on the sign of the effect of predictors slightly improved the predictive ability. Calibration improved when shrinkage was applied on the standard maximum likelihood estimates of the regression coefficients. In conclusion, a sensible strategy in small data sets is to apply shrinkage methods in full models that include well-coded predictors that are selected based on external information.