Estimating Ability With the Wrong Model

Abstract
No model is ever a perfect reflection of the data it is to summarize. There are always errors of fit. This is as true with modern item response theory (IRT) as with all other models. It is important to know to what extent the accuracy of measurement made with these models is influenced by misfit and what can be done to minimize the inaccuracy. First, a detailed general model was fit to data to provide the framework for a realistic simulation structure. Then three of the most commonly used IRT models were fit in this simulation. A variety of robust estimators of ability were used and the accuracy and efficiency of each estimator was determined. With short tests, a simple model coupled with a robust estimator seemed to be the methodology of choice for describing the data. As test length increased, so too did the benefits of utilizing a more complex parameterization. An unexpected finding was that coupling robust estimators with a Bayesian prior yielded substantial shrinkage. Future work on ability estimation, especially for practical applications of adaptive testing, is required to “unshrink” ability estimates.