Estimating Ability with the Wrong Model.

Abstract
Using simulated item response data, the performance of several 'robust' and conventional schemes for ability estimation was evaluated in conjunction with logistic item response theory models (one, two, and three parometer models). The simulated item response data were generated using a model that is more complex than are the usual logistic models; therefore, all three models were fundamentally (and realistically) 'wrong'. Consideration was given to estimation with a few responses (four) and with large numbers (20 and 40). With few item responses, the relative 'wrongness' of the model had little effect, whereas the choice of estimator had serious consequences. With many items, the choice of item response model made more difference than did the choice of estimator. Implications of these findings for computerized adaptive testing are discussed. Keywords: AMJACK estimator, Armed Services Vocational Aptitude Battery, Biweight estimator, H-estimates, and M-estimates.