Forecasting influenza in Hong Kong with Google search queries and statistical model fusion
Open Access
- 2 May 2017
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLOS ONE
- Vol. 12 (5), e0176690
- https://doi.org/10.1371/journal.pone.0176690
Abstract
The objective of this study is to investigate predictive utility of online social media and web search queries, particularly, Google search data, to forecast new cases of influenza-like-illness (ILI) in general outpatient clinics (GOPC) in Hong Kong. To mitigate the impact of sensitivity to self-excitement (i.e., fickle media interest) and other artifacts of online social media data, in our approach we fuse multiple offline and online data sources. Four individual models: generalized linear model (GLM), least absolute shrinkage and selection operator (LASSO), autoregressive integrated moving average (ARIMA), and deep learning (DL) with Feedforward Neural Networks (FNN) are employed to forecast ILI-GOPC both one week and two weeks in advance. The covariates include Google search queries, meteorological data, and previously recorded offline ILI. To our knowledge, this is the first study that introduces deep learning methodology into surveillance of infectious diseases and investigates its predictive utility. Furthermore, to exploit the strength from each individual forecasting models, we use statistical model fusion, using Bayesian model averaging (BMA), which allows a systematic integration of multiple forecast scenarios. For each model, an adaptive approach is used to capture the recent relationship between ILI and covariates. DL with FNN appears to deliver the most competitive predictive performance among the four considered individual models. Combing all four models in a comprehensive BMA framework allows to further improve such predictive evaluation metrics as root mean squared error (RMSE) and mean absolute predictive error (MAPE). Nevertheless, DL with FNN remains the preferred method for predicting locations of influenza peaks. The proposed approach can be viewed a feasible alternative to forecast ILI in Hong Kong or other countries where ILI has no constant seasonal trend and influenza data resources are limited. The proposed methodology is easily tractable and computationally efficient.Keywords
Funding Information
- City University of Hong Kong (CityU8/CRF/12G)
- University Grants Committee (T32-102/14-N)
- National Natural Science Foundation of China (71420107023)
- Division of Mathematical Sciences (NSF DMS 1514808)
This publication has 61 references indexed in Scilit:
- Monitoring Influenza Epidemics in China with Search Query from BaiduPLOS ONE, 2013
- Influenza Forecasting with Google Flu TrendsPLOS ONE, 2013
- Using Google Trends for Influenza Surveillance in South ChinaPLOS ONE, 2013
- Forecasting Peaks of Seasonal Influenza EpidemicsPLoS Currents, 2013
- Estimating Infection Attack Rates and Severity in Real Time during an Influenza Pandemic: Analysis of Serial Cross-Sectional Serologic Surveillance DataPLoS Medicine, 2011
- Modeling and Predicting Seasonal Influenza Transmission in Warm Regions Using Climatological ParametersPLOS ONE, 2010
- Absolute humidity modulates influenza survival, transmission, and seasonalityProceedings of the National Academy of Sciences of the United States of America, 2009
- Detecting influenza epidemics using search engine query dataNature, 2009
- Influenza Virus Transmission Is Dependent on Relative Humidity and TemperaturePLoS Pathogens, 2007
- Influenza Seasonality: Underlying Causes and Modeling TheoriesJournal of Virology, 2007