Putting double marking to the test: a framework to assess if it is worth the trouble

1 March 2005

journal article
Published by Wiley in Medical Education

Vol. 39 (3), 299-308
https://doi.org/10.1111/j.1365-2929.2005.02093.x

Abstract

It is a challenge to assign a mark that accurately measures the quality of students' work in essay-type assessments that require an element of judgement and fairness by the markers. Double marking such assessments has been seen as a way of improving the reliability of the mark. The analysis approach often taken is to look for absolute agreement between markers instead of looking at all aspects of reliability. To develop an analytic process that will examine the components and meanings of reliability calculations that can be used for assessing the value of double marking a piece of work. An undergraduate case study assessment in General Practice was used as an illustration. Datasets of double marking were collected retrospectively for 1999-2000, and prospectively for 2002-03. An assessment of intermarker agreement and its effect on the reliability of the final mark for students was made, using methods dependent on the type of data collected and Generalisability Theory. The data were used to illustrate how to interpret the results of Bland and Altman plots, anova tables and Cohen's kappa calculations. Generalisability Theory was used to show that, while there was reasonable agreement between markers, the reliability of the mark for the student was still only moderate, probably due to unexplained variability elsewhere in the process. Possible reasons for this variability are discussed. A flowchart of the decisions and actions needed to judge whether a piece of work should be double marked has been constructed.

Keywords

This publication has 15 references indexed in Scilit:

An application of judgment analysis to examination marking in psychology
British Journal of Psychology, 2002
Variations among examiners in family medicine residency board oral examinations
Medical Education, 2000
Psychology examiners re-examined: A 5-year perspective
Studies in Higher Education, 1999
The Reliability of Markers
British Journal of Occupational Therapy, 1998
A new approach to exploring biases in educational assessment
British Journal of Psychology, 1996
Performance‐Based Assessment: Implications of Task Specificity
Educational Measurement: Issues and Practice, 1994
Factors influencing reproducibility of tests using standardized patients
Teaching and Learning in Medicine, 1989
STATISTICAL METHODS FOR ASSESSING AGREEMENT BETWEEN TWO METHODS OF CLINICAL MEASUREMENT
The Lancet, 1986
Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit.
Psychological Bulletin, 1968
A Coefficient of Agreement for Nominal Scales
Educational and Psychological Measurement, 1960

Cited by 10 articles