Sample size calculations for 3-level cluster randomized trials

Abstract
Background The first applications of cluster randomized trials with three instead of two levels are beginning to appear in health research, for instance, in trials where different strategies to implement best-practice guidelines are compared. In such trials, the strategy is implemented in health care units (`clusters') and aims at changing the behavior of health care professionals working in this unit (`subjects'), while the effects are measured at patient level (`evaluations'). Purpose To guide the choice of number of clusters, number of subjects per cluster, and number of evaluations per subject. Methods We derive a sample size formula and investigate the influence of sample allocation on power or number of clusters required. Results The required sample size is the product of the sample size in absence of correlation and two variance inflation factors (VIFs) that describe the clustering of evaluations within subjects and of subjects within cluster, respectively. Because each VIF is expressed in terms of an interpretable Pearson correlation, subject matter knowledge can be incorporated. Moreover, these Pearson's correlations are related to intracluster correlations (ICCs) from comparable, but 2-level cluster randomized trials. Formulas are obtained to guide the sample allocation (number of clusters, subjects, and evaluations) for minimizing total sample size, minimizing the number of clusters, or maximizing power given a budget constraint. Limitations Empirical estimates of variance components or ICCs from 3-level cluster trials are scarce which limits reliably powering. Conclusions When parameterized in terms of Pearson correlations, the two variance inflation factors give quantitative insight into the impact of the number of clusters, subjects and evaluations on power. Moreover, subject matter knowledge as well as ICCs from 2-level cluster randomized trials can be incorporated in the sample size calculation, when empirical estimates of variance components or ICCs from a pilot or comparable 3-level study are lacking. Clinical Trials 2008; 5: 486—495. http://ctj.sagepub.com