Examination of the Assumptions and Properties of the Graded Item Response Model: An Example Using a Mathematics Performance Assessment

Abstract
With the growing popularity of performance assessments over the last decade,the use of item response theory (IRT) models for polytomously scored items has increased. However, prior to applying the graded item response model to data derived from a performance assessment, studies are needed to ensure that the assumptions and item parameter properties of the models are satisfied. This study examined the dimensionality of a mathematics performance assessment, the extent to which a subset of the tasks is speeded, and the extent to which the item parameter estimates are stable over time. The results from confirmatory factor analyses on three testing occasions indicated that the mathematics perfor- mance assessment is unidimensional on each occasion. For two of the eight tasks that were examined for "speededness," the threshold and slope parameter estimates were not stable over two conditions of administration time (i.e., approximately 5 vs. 10 min), and for another two tasks, only the slope parame- ter estimates were not stable over the two conditions of administration time. The analysis of the stability of item parameter estimates over time indicated that, from the fall of 1991 to the spring of 1992, the parameter estimates were stable. However, from the fall of 1992 to the spring of 1993, both the slope and threshold parameter estimates were variant for 2 of the 33 tasks, and for another two tasks, only the threshold estimates differed. Some potential reasons for the instability of the item parameter estimates and the speededness of tasks are discussed. For example, the differential emphasis on instructional content be- tween testing occasions may affect the stability of item parameters over time.