Cross-validation of two prognostic trauma scores in severely injured patients

Abstract
Introduction Trauma scoring systems are important tools for outcome prediction and severity adjustment that informs trauma quality assessment and research. Discrimination and precision of such systems is tested in validation studies. The German TraumaRegister DGU(R) (TR-DGU) and the Trauma Audit and Research Network (TARN) from the UK agreed on a cross-validation study to validate their prediction scores (RISC II and PS14, respectively). Methods Severe trauma patients with an Injury Severity Score (ISS) >= 9 documented in 2015 and 2016 were selected in both registries (primary admissions only). The predictive scores from each registry were applied to the selected data sets. Observed and predicted mortality were compared to assess precision; area under the receiver operating characteristic curve was used for discrimination. Hosmer-Lemeshow statistic was calculated for calibration. A subgroup analysis including patients treated in intensive care unit (ICU) was also carried out. Results From TR-DGU, 40,638 patients were included (mortality 11.7%). The RISC II predicted mortality was 11.2%, while PS14 predicted 16.9% mortality. From TARN, 64,622 patients were included (mortality 9.7%). PS14 predicted 10.6% mortality, while RISC II predicted 17.7%. Despite the identical cutoff of ISS >= 9, patient groups from both registries showed considerable difference in need for intensive care (88% versus 18%). Subgroup analysis of patients treated on ICU showed nearly identical values for observed and predicted mortality using RISC II. Discussion Each score performed well within its respective registry, but when applied to the other registry a decrease in performance was observed. Part of this loss of performance could be explained by different development data sets: the RISC II is mainly based on patients treated in an ICU, while the PS14 includes cases mainly cared for outside ICU with more moderate injury severity. This is according to the respective inclusion criteria of the two registries. Conclusion External validations of prediction models between registries are needed, but may show that prediction models are not fully transferable to other health-care settings.