Sample size for positive and negative predictive value in diagnostic research using case–control designs

Abstract
Important properties of diagnostic methods are their sensitivity, specificity, and positive and negative predictive values (PPV and NPV). These methods are typically assessed via case–control samples, which include one cohort of cases known to have the disease and a second control cohort of disease-free subjects. Such studies give direct estimates of sensitivity and specificity but only indirect estimates of PPV and NPV, which also depend on the disease prevalence in the tested population. The motivating example arises in assay testing, where usage is contemplated in populations with known prevalences. Further instances include biomarker development, where subjects are selected from a population with known prevalence and assessment of PPV and NPV is crucial, and the assessment of diagnostic imaging procedures for rare diseases, where case–control studies may be the only feasible designs. We develop formulas for optimal allocation of the sample between the case and control cohorts and for computing sample size when the goal of the study is to prove that the test procedure exceeds pre-stated bounds for PPV and/or NPV. Surprisingly, the optimal sampling schemes for many purposes are highly unbalanced, even when information is desired on both PPV and NPV.