Evaluation of a Parsimonious COVID-19 Outbreak Prediction Model: Heuristic Modeling Approach Using Publicly Available Data Sets

Abstract
Journal of Medical Internet Research - International Scientific Journal for Medical Research, Information and Communication on the Internet #Preprint #PeerReviewMe: Warning: This is a unreviewed preprint. Readers are warned that the document has not been peer-reviewed by expert/patient reviewers or an academic editor, may contain misleading claims, and is likely to undergo changes before final publication, if accepted, or may have been rejected/withdrawn. Readers with interest and expertise are encouraged to sign up as peer-reviewer, if the paper is within an open peer-review period. Please cite this preprint only for review purposes or for grant applications and CVs (if you are the author). Background: Coronavirus disease 2019 (COVID-19) pandemic has changed public health policies and personal lifestyles through lockdowns and mandates. Governments are rapidly evolving policies to increase hospital capacity and supply personal protective equipment to mitigate disease spread in distressed regions. Current models that predict COVID-19 case counts and spread, such as deep learning, offer limited explainability and generalizability. This creates a gap for highly accurate and robust outbreak prediction models which balance parsimony and fit. Objective: We seek to leverage various readily accessible datasets extracted from multiple states to train and evaluate a parsimonious predictive model capable of identifying county-level risk of COVID-19 outbreaks on a day-to-day basis. Methods: Our methods use the following data inputs: COVID-19 case counts per county per day and county populations. We developed an outbreak gold standard across California, Indiana, and Iowa. The model was trained on data between 3/1/20-8/31/20, then tested from 9/1/20 to 10/31/20 against the gold standard to derive confusion matrix statistics. Results: The model reported sensitivities of 92%, 90%, and 81% for Indiana, Iowa, and California respectively. The precision in each state was above 85%, and the specificity and accuracy were generally greater than 95%. Conclusions: The parsimonious model provide a generalizable and simple alternative approach to outbreak prediction. Our methodology could be tested on diverse regions to aid government officials and hospitals with resource allocation.