Short‐term prediction of mortality in patients with systemic lupus erythematosus: Classification of outcomes using random forests

Abstract
Objective To identify demographic and clinical characteristics that classify patients with systemic lupus erythematosus (SLE) at risk for in-hospital mortality. Methods Patients hospitalized in California from 1996 to 2000 with a principal diagnosis of SLE (N = 3,839) were identified from a state hospitalization database. As candidate predictors of mortality, we used patient demographic characteristics; the presence or absence of 40 different clinical conditions listed among the discharge diagnoses; and 2 summary indexes derived from the discharge diagnoses, the Charlson Index and the SLE Comorbidity Index. Predictors of patients at increased risk of mortality were identified and validated using random forests, a statistical procedure that is a generalization of single classification trees. Random forests use bootstrapped samples of patients and randomly selected subsets of predictors to create individual classification trees, and this process is repeated to generate multiple trees (a forest). Classification is then done by majority vote across all trees. Results Of the 3,839 patients, 109 died during hospitalization. Selecting from all available predictors, the random forests had excellent predictive accuracy for classification of death. The mean classification error rate, averaged over 10 forests of 500 trees each, was 11.9%. The most important predictors were the Charlson Index, respiratory failure, SLE Comorbidity Index, age, sepsis, nephritis, and thrombocytopenia. Conclusion Information on clinical diagnoses can be used to accurately predict mortality among hospitalized patients with SLE. Random forests represent a useful technique to identify the most important predictors from a larger (often much larger) number and to validate the classification.