A surgery oral examination

Abstract
Poor interrater reliability is a common objection to the use of oral examinations. In 1990 the authors measured the agreement of 140 U.S. and Canadian surgical raters and the influences, if any, of age, years in practice, and experience as an examiner on individual oral examination scores. Eight actor examinees memorized transcripts of actual oral examinations and were videotaped using a single examiner. Examinee verbal style, dress, content of answers, and gender were purposefully adjusted. A repeated-measures analysis of variance was used for data analysis. Three aspects of examinee performance influenced scores (verbal style, dress, and content of answers). No rater characteristic significantly affected scores. Raters showed high agreement (86%) when rating "good" performances but less agreement (67%) when rating "poor" performances. The oral examination scores were not influenced by rater selection. The raters ranked good performances more consistently than poor performances. Therefore, more than one examiner appears necessary to confirm a poor performance during an examination.