Estimation of required sample size for external validation of risk models for binary outcomes

Open Access

21 April 2021

journal article
research article
Published by SAGE Publications in Statistical Methods in Medical Research

Vol. 30 (10), 2187-2206
https://doi.org/10.1177/09622802211007522

Abstract

Risk-prediction models for health outcomes are used in practice as part of clinical decision-making, and it is essential that their performance be externally validated. An important aspect in the design of a validation study is choosing an adequate sample size. In this paper, we investigate the sample size requirements for validation studies with binary outcomes to estimate measures of predictive performance (C-statistic for discrimination and calibration slope and calibration in the large). We aim for sufficient precision in the estimated measures. In addition, we investigate the sample size to achieve sufficient power to detect a difference from a target value. Under normality assumptions on the distribution of the linear predictor, we obtain simple estimators for sample size calculations based on the measures above. Simulation studies show that the estimators perform well for common values of the C-statistic and outcome prevalence when the linear predictor is marginally Normal. Their performance deteriorates only slightly when the normality assumptions are violated. We also propose estimators which do not require normality assumptions but require specification of the marginal distribution of the linear predictor and require the use of numerical integration. These estimators were also seen to perform very well under marginal normality. Our sample size equations require a specified standard error (SE) and the anticipated C-statistic and outcome prevalence. The sample size requirement varies according to the prognostic strength of the model, outcome prevalence, choice of the performance measure and study objective. For example, to achieve an SE < 0.025 for the C-statistic, 60–170 events are required if the true C-statistic and outcome prevalence are between 0.64–0.85 and 0.05–0.3, respectively. For the calibration slope and calibration in the large, achieving SE < 0.15

would require 40–280 and 50–100 events, respectively. Our estimators may also be used for survival outcomes when the proportion of censored observations is high.

Keywords

Funding Information

Medical Research Council (MC_UU_00002/10, MC_UU_12023/29, MR/P015190/1)

This publication has 30 references indexed in Scilit:

Interpreting the concordance statistic of a logistic regression model: relation to the variance and odds ratio of a continuous explanatory variable
BMC Medical Research Methodology, 2012
An independent external validation and evaluation of QRISK cardiovascular risk prediction: a prospective open cohort study
BMJ, 2009
External validation of prognostic models for critically ill patients required substantial sample sizes
Journal of Clinical Epidemiology, 2007
Generic, Simple Risk Stratification Model for Heart Valve Surgery
Journal of the American College of Cardiology, 2005
Substantial effective sample sizes were required for external validation studies of predictive logistic regression models
Journal of Clinical Epidemiology, 2005
On criteria for evaluating models of absolute risk
Biostatistics, 2005
A comparison of the logistic risk function and the proportional hazards model in prospective epidemiologic studies
Journal of Chronic Diseases, 1983
The Efficiency of Logistic Regression Compared to Normal Discriminant Analysis
Journal of the American Statistical Association, 1975
Two further applications of a model for binary regression
Biometrika, 1958
Tables for Computing Bivariate Normal Probabilities
The Annals of Mathematical Statistics, 1956

Cited by 25 articles