Predicting Risk of Sport-Related Concussion in Collegiate Athletes and Military Cadets: A Machine Learning Approach Using Baseline Data from the CARE Consortium Study

Abstract
Objective To develop a predictive model for sport-related concussion in collegiate athletes and military service academy cadets using baseline data collecting during the pre-participation examination. Methods Baseline assessments were performed in 15,682 participants from 21 US academic institutions and military service academies participating in the CARE Consortium Study during the 2015–2016 academic year. Participants were monitored for sport-related concussion during the subsequent season. 176 baseline covariates mapped to 957 binary features were used as input into a support vector machine model with the goal of learning to stratify participants according to their risk for sport-related concussion. Performance was evaluated in terms of area under the receiver operating characteristic curve (AUROC) on a held-out test set. Model inputs significantly associated with either increased or decreased risk were identified. Results 595 participants (3.79%) sustained a concussion during the study period. The predictive model achieved an AUROC of 0.73 (95% confidence interval 0.70–0.76), with variable performance across sports. Features with significant positive and negative associations with subsequent sport-related concussion were identified. Conclusion(s) This predictive model using only baseline data identified athletes and cadets who would go on to sustain sport-related concussion with comparable accuracy to many existing concussion assessment tools for identifying concussion. Furthermore, this study provides insight into potential concussion risk and protective factors.
Funding Information
  • National Collegiate Athletic Association
  • U.S. Department of Defense (W81XWH-BA170608)