Determining the Generalizability of Rating Scales in Clinical Settings

Abstract
Traditional approaches to interrater reliability presuppose that reliability is diminished only by undifferentiated random measurement errors. Generalizability theory offers a more comprehensive and appropriate framework for viewing problems associated with assessments derived from multiple raters. A study involving the use of the Functional Limitation Scale (FLS) by three physicians to assess patient status at five points in time illustrates this new approach. Data were analyzed in a multi-way factorial ANOVA design with the levels of raters (3), occasions (5), and patients (15). Examination of the variance components and associated generalizability coefficients revealed that the raters alone or in interaction with other factors contributed little variance, while the greatest proportion of variance was attributable to the difference measurement occasions and the patients or these factors in interaction. The FLS was found to be capable of differentiating between patients and the effect of time on patient status, regardless of the physician rater(s). This study illustrates how one multi-faceted design can answer questions that formerly required several separate data sets.