Sociodemographic and clinical features predictive of SARS-CoV-2 test positivity across healthcare visit-types
Open Access
- 14 October 2021
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLOS ONE
- Vol. 16 (10), e0258339
- https://doi.org/10.1371/journal.pone.0258339
Abstract
Despite increased testing efforts and the deployment of vaccines, COVID-19 cases and death toll continue to rise at record rates. Health systems routinely collect clinical and non-clinical information in electronic health records (EHR), yet little is known about how the minimal or intermediate spectra of EHR data can be leveraged to characterize patient SARS-CoV-2 pretest probability in support of interventional strategies. We modeled patient pretest probability for SARS-CoV-2 test positivity and determined which features were contributing to the prediction and relative to patients triaged in inpatient, outpatient, and telehealth/drive-up visit-types. Data from the University of Washington (UW) Medicine Health System, which excluded UW Medicine care providers, included patients predominately residing in the Seattle Puget Sound area, were used to develop a gradient-boosting decision tree (GBDT) model. Patients were included if they had at least one visit prior to initial SARS-CoV-2 RT-PCR testing between January 01, 2020 through August 7, 2020. Model performance assessments used area-under-the-receiver-operating-characteristic (AUROC) and area-under-the-precision-recall (AUPR) curves. Feature performance assessments used SHapley Additive exPlanations (SHAP) values. The generalized pretest probability model using all available features achieved high overall discriminative performance (AUROC, 0.82). Performance among inpatients (AUROC, 0.86) was higher than telehealth/drive-up testing (AUROC, 0.81) or outpatient testing (AUROC, 0.76). The two-week test positivity rate in patient ZIP code was the most informative feature towards test positivity across visit-types. Geographic and sociodemographic factors were more important predictors of SARS-CoV-2 positivity than individual clinical characteristics. Recent geographic and sociodemographic factors, routinely collected in EHR though not routinely considered in clinical care, are the strongest predictors of initial SARS-CoV-2 test result. These findings were consistent across visit types, informing our understanding of individual SARS-CoV-2 risk factors with implications for deployment of testing, outreach, and population-level prevention efforts.Funding Information
- Microsoft Research
- UW Population Health Initative
- National Institute of General Medical Sciences (5T32GM086270)
This publication has 42 references indexed in Scilit:
- The Incubation Period of Coronavirus Disease 2019 (COVID-19) From Publicly Reported Confirmed Cases: Estimation and ApplicationAnnals of Internal Medicine, 2020
- Rapid response to COVID-19: health informatics support for outbreak management in an academic health systemJournal of the American Medical Informatics Association, 2020
- Defining the Epidemiology of Covid-19 — Studies NeededThe New England Journal of Medicine, 2020
- Clinical Characteristics of 138 Hospitalized Patients With 2019 Novel Coronavirus–Infected Pneumonia in Wuhan, ChinaJAMA, 2020
- An interactive web-based dashboard to track COVID-19 in real timeThe Lancet Infectious Diseases, 2020
- Drug-resistant enteric fever worldwide, 1990 to 2018: a systematic review and meta-analysisBMC Medicine, 2020
- Sampling and Sampling Frames in Big Data EpidemiologyCurrent Epidemiology Reports, 2019
- Feasibility and utility of applications of the common data model to multiple, disparate observational health databasesJournal of the American Medical Informatics Association, 2015
- On the use of ZIP codes and ZIP code tabulation areas (ZCTAs) for the spatial analysis of epidemiological dataInternational Journal of Health Geographics, 2006
- Zip Code Caveat: Bias Due to Spatiotemporal Mismatches Between Zip Codes and US Census–Defined Geographic Areas—The Public Health Disparities Geocoding ProjectAmerican Journal of Public Health, 2002