Reliability of the long case

Abstract
The use of long cases for summative assessment of clinical competence is limited by concerns about unreliability. This study aims to explore the reliability of long cases and how reliability is affected by supplementation with short cases. We performed a statistical analysis of examinations held by the Royal Australasian College of Physicians in 2005 and 2006 to determine overall reliability and sources of variance in reliability according to candidate ability, case difficulty and inter-examiner differences. Scores for 546 long cases in 2005 and 773 long cases in 2006 were analysed. In 2006, 38% of the total variation in long case data was explained by variation in candidate ability, with other significant contributors to variance being candidate x case and candidate x examiner interactions. Similar figures were found for the 2005 examinations. A short case is less reliable than a long case, but when examiner time is taken into account, three short cases are as reliable as one long case. Any combination of short and long cases would require 4-5 hours of testing time in order to achieve dependability > 0.7. Long cases can be optimised for reliability but time limits their use as the sole tool in a high-stakes examination. Further examiner training, better case selection, or greater use of short cases would have minimal impact on reliability. Reliability can be improved by either increasing examination time or including additional methods of summative assessment, such as might be provided by workplace assessment.