Association of intracluster correlation measures with outcome prevalence for binary outcomes in cluster randomised trials

Abstract
In cluster randomised trials, a measure of intracluster correlation such as the intraclass correlation coefficient (ICC) should be reported for each primary outcome. Providing intracluster correlation estimates may help in calculating sample size of future cluster randomised trials and also in interpreting the results of the trial from which they are derived. For a binary outcome, the ICC is known to be associated with its prevalence, which raises at least two issues. First, it questions the use of ICC estimates obtained on a binary outcome in a trial for sample size calculations in a subsequent trial in which the same binary outcome is expected to have a different prevalence. Second, it challenges the interpretation of ICC estimates because they do not solely depend on clustering level. Other intracluster correlation measures proposed for clustered binary data settings include the variance partition coefficient, the median odds ratio and the tetrachoric correlation coefficient. Under certain assumptions, the theoretical maximum possible value for an ICC associated with a binary outcome can be derived, and we proposed the relative deviation of an ICC estimate to this maximum value as another measure of the intracluster correlation. We conducted a simulation study to explore the dependence of these intracluster correlation measures on outcome prevalence and found that all are associated with prevalence. Even if all depend on prevalence, the tetrachoric correlation coefficient computed with Kirk’s approach was less dependent on the outcome prevalence than the other measures when the intracluster correlation was about 0.05. We also observed that for lower values, such as 0.01, the analysis of variance estimator of the ICC is preferred.