Assessment of autoregressive integrated moving average (ARIMA), generalized linear autoregressive moving average (GLARMA), and random forest (RF) time series regression models for predicting influenza A virus frequency in swine in Ontario, Canada

Abstract
Influenza A virus commonly circulating in swine (IAV-S) is characterized by large genetic and antigenic diversity and, thus, improvements in different aspects of IAV-S surveillance are needed to achieve desirable goals of surveillance such as to establish the capacity to forecast with the greatest accuracy the number of influenza cases likely to arise. Advancements in modeling approaches provide the opportunity to use different models for surveillance. However, in order to make improvements in surveillance, it is necessary to assess the predictive ability of such models. This study compares the sensitivity and predictive accuracy of the autoregressive integrated moving average (ARIMA) model, the generalized linear autoregressive moving average (GLARMA) model, and the random forest (RF) model with respect to the frequency of influenza A virus (IAV) in Ontario swine. Diagnostic data on IAV submissions in Ontario swine between 2007 and 2015 were obtained from the Animal Health Laboratory (University of Guelph, Guelph, ON, Canada). Each modeling approach was examined for predictive accuracy, evaluated by the root mean square error, the normalized root mean square error, and the model’s ability to anticipate increases and decreases in disease frequency. Likewise, we verified the magnitude of improvement offered by the ARIMA, GLARMA and RF models over a seasonal-naïve method. Using the diagnostic submissions, the occurrence of seasonality and the long-term trend in IAV infections were also investigated. The RF model had the smallest root mean square error in the prospective analysis and tended to predict increases in the number of diagnostic submissions and positive virological submissions at weekly and monthly intervals with a higher degree of sensitivity than the ARIMA and GLARMA models. The number of weekly positive virological submissions is significantly higher in the fall calendar season compared to the summer calendar season. Positive counts at weekly and monthly intervals demonstrated a significant increasing trend. Overall, this study shows that the RF model offers enhanced prediction ability over the ARIMA and GLARMA time series models for predicting the frequency of IAV infections in diagnostic submissions.