A primer on classical test theory and item response theory for assessments in medical education

1 January 2010

journal article
review article
Published by Wiley in Medical Education

Vol. 44 (1), 109-117
https://doi.org/10.1111/j.1365-2923.2009.03425.x

Abstract

A test score is a number which purportedly reflects a candidate's proficiency in some clearly defined knowledge or skill domain. A test theory model is necessary to help us better understand the relationship that exists between the observed (or actual) score on an examination and the underlying proficiency in the domain, which is generally unobserved. Common test theory models include classical test theory (CTT) and item response theory (IRT). The widespread use of IRT models over the past several decades attests to their importance in the development and analysis of assessments in medical education. Item response theory models are used for a host of purposes, including item analysis, test form assembly and equating. Although helpful in many circumstances, IRT models make fairly strong assumptions and are mathematically much more complex than CTT models. Consequently, there are instances in which it might be more appropriate to use CTT, especially when common assumptions of IRT cannot be readily met, or in more local settings, such as those that may characterise many medical school examinations. The objective of this paper is to provide an overview of both CTT and IRT to the practitioner involved in the development and scoring of medical education assessments. The tenets of CCT and IRT are initially described. Then, main uses of both models in test development and psychometric activities are illustrated via several practical examples. Finally, general recommendations pertaining to the use of each model in practice are outlined. Classical test theory and IRT are widely used to address measurement-related issues that arise from commonly used assessments in medical education, including multiple-choice examinations, objective structured clinical examinations, ward ratings and workplace evaluations. The present paper provides an introduction to these models and how they can be applied to answer common assessment questions.

Keywords

This publication has 14 references indexed in Scilit:

Reliability: on the reproducibility of assessment data
Medical Education, 2004
Item response theory: applications of modern test theory in medical education
Medical Education, 2003
Assessing Medical Studentsʼ Clinical Sciences Knowledge in France
Academic Medicine, 2003
Understanding Reliability
Educational Measurement: Issues and Practice, 1991
Assessment of clinical skills with standardized patients: State of the art
Teaching and Learning in Medicine, 1990
A Practitioner's Guide to Computation and Interpretation of Reliability Indices for Mastery Tests
Journal of Educational Measurement, 1988
Coefficient alpha and the internal structure of tests
Psychometrika, 1951
The Concepts of Reliability and Homogeneity
Educational and Psychological Measurement, 1950
Theory of mental tests.
Published by American Psychological Association (APA) ,1950
Demonstration of Formulae for True Measurement of Correlation
The American Journal of Psychology, 1907

Cited by 181 articles