A primer on classical test theory and item response theory for assessments in medical education
- 1 January 2010
- journal article
- review article
- Published by Wiley in Medical Education
- Vol. 44 (1), 109-117
- https://doi.org/10.1111/j.1365-2923.2009.03425.x
Abstract
A test score is a number which purportedly reflects a candidate's proficiency in some clearly defined knowledge or skill domain. A test theory model is necessary to help us better understand the relationship that exists between the observed (or actual) score on an examination and the underlying proficiency in the domain, which is generally unobserved. Common test theory models include classical test theory (CTT) and item response theory (IRT). The widespread use of IRT models over the past several decades attests to their importance in the development and analysis of assessments in medical education. Item response theory models are used for a host of purposes, including item analysis, test form assembly and equating. Although helpful in many circumstances, IRT models make fairly strong assumptions and are mathematically much more complex than CTT models. Consequently, there are instances in which it might be more appropriate to use CTT, especially when common assumptions of IRT cannot be readily met, or in more local settings, such as those that may characterise many medical school examinations. The objective of this paper is to provide an overview of both CTT and IRT to the practitioner involved in the development and scoring of medical education assessments. The tenets of CCT and IRT are initially described. Then, main uses of both models in test development and psychometric activities are illustrated via several practical examples. Finally, general recommendations pertaining to the use of each model in practice are outlined. Classical test theory and IRT are widely used to address measurement-related issues that arise from commonly used assessments in medical education, including multiple-choice examinations, objective structured clinical examinations, ward ratings and workplace evaluations. The present paper provides an introduction to these models and how they can be applied to answer common assessment questions.Keywords
This publication has 14 references indexed in Scilit:
- Reliability: on the reproducibility of assessment dataMedical Education, 2004
- Item response theory: applications of modern test theory in medical educationMedical Education, 2003
- Assessing Medical Studentsʼ Clinical Sciences Knowledge in FranceAcademic Medicine, 2003
- Understanding ReliabilityEducational Measurement: Issues and Practice, 1991
- Assessment of clinical skills with standardized patients: State of the artTeaching and Learning in Medicine, 1990
- A Practitioner's Guide to Computation and Interpretation of Reliability Indices for Mastery TestsJournal of Educational Measurement, 1988
- Coefficient alpha and the internal structure of testsPsychometrika, 1951
- The Concepts of Reliability and HomogeneityEducational and Psychological Measurement, 1950
- Theory of mental tests.Published by American Psychological Association (APA) ,1950
- Demonstration of Formulae for True Measurement of CorrelationThe American Journal of Psychology, 1907