Comparison of the validity and reliability of two image classification systems for the assessment of mammogram quality

Abstract
Objective: To compare the reliability and validity of two classification systems used to evaluate the quality of mammograms: PGMI ('perfect', 'good', 'moderate' and 'inadequate') and EAR ('excellent', 'acceptable' and 'repeat'). Setting: New South Wales (Australia)population-based mammography screening programme (BreastScreen NSW). Methods: Thirty sets of mammograms were rated by 21 radiographers and an expert panel. PGMI and EAR criteria were used to assign ratings to the medio-lateral oblique (MLO) and cranio-caudal (CC) views for each setof films. Inter-observer reliability and criterion validity (compared with expert panel ratings) were assessed using mean weighted observed agreement and kappa statistics. Results: Reliability : Kappa values for both classification systems were low (0.01–0.17). PGMI producedsignificantly higher values than EAR. Agreement between raters was higher using PGMI than EAR for the MLO view (77% versus 74%, P Conclusions: Both PGMI and EAR have poor reliability and validity in evaluating mammogram quality. EAR is not a suitable alternative to PGMI, which must be improved if it is to be useful.