Reliability Assessment of Society for Fetal Urology Ultrasound Grading System for Hydronephrosis

Abstract
Purpose: The Society for Fetal Urology introduced a subjective grading system for classifying hydronephrosis that has important implications in patient diagnosis, treatment and outcome. The grading system is frequently used to standardize the severity of hydronephrosis, and compare results among patients and centers. Despite widespread use to our knowledge no groups have investigated the reliability of the grading system since its introduction. We assessed the intrarater and interrater reliability of the Society for Fetal Urology grading system for hydronephrosis and examined levels of agreement by the degree of hydronephrosis (grades 0 to 4) and level of experience (staff vs trainee). Materials and Methods: A series of 50 pediatric renal ultrasound images from patients with a diagnosis of hydronephrosis were assessed by 4 staff individuals and 4 trainees using the Society for Fetal Urology grading system. Ultrasound images included the kidneys, ureters and bladder to be consistent with practice. After 7 to 14 days each rater repeated the assessment. The nonweighted Cohen κ statistic was used to estimate intrarater and interrater reliability by Society for Fetal Urology grade and training level. Results: Staff and trainee raters independently assigned Society for Fetal Urology grades to 50 patients (99 renal units). The average number of images per ultrasound was 41, including the right and left kidneys. Overall interrater agreement for staff individuals was substantial for grade 0, moderate for grades 1, 2 and 4, and only slight to fair for grade 3. Intrarater agreement was substantial to almost perfect for staff agreement (range 69% to 94%, κ 0.56 to 0.89) and trainees (range 63% to 90%, κ 0.48 to 0.85). Conclusions: Our study suggests that the Society for Fetal Urology grading system has good intrarater but modest interrater reliability. Individual rater interpretations of the grading system may explain the modest interrater agreement. Proposed modifications to the Society for Fetal Urology classification system, such as distinguishing between diffuse and segmental cortical thinning, may improve reliability.