Factors influencing reproducibility of tests using standardized patients

Abstract
Psychometric studies of assessment with standardized patients (SPs) have established that substantial testing time is required to obtain reproducible scores. Using data drawn from two recent large‐scale studies, this article explores the reasons behind this result and suggests several strategies for reducing testing‐time requirements. Case‐specificity—inconsistency of an examinee's performance over cases—appears to be the major source of measurement error in SP‐based tests. Use of multiple SPs to play the same case role for different examinees does not affect reproducibility of scores at typical test lengths. Similarly, low‐to‐moderate levels of interrater agreement do not markedly affect score reproducibility, as long as a reasonably large number of cases are included in an assessment. Often, SP‐based assessment can be viewed within a mastery‐testing framework, where reproducibility of pass‐fail decisions, rather than scores, is of primary importance. Testing‐time requirements within a mastery‐testing framework depend, in part, on the quality of group performance in relation to the pass‐fail point: If performance is quite good (or quite poor), relatively short tests can be used without compromising reproducibility of pass‐fail decisions. Adoption of a sequential testing approach, in which a brief screening test is given initially to identify examinees close to the pass‐fail point, can further reduce use of testing resources. Alternatively, SP‐based assessment can be incorporated into batteries with written tests to reduce testing time without sacrificing beneficial educational effects.