Feature selection for global tropospheric ozone prediction based on the BO-XGBoost-RFE algorithm
Open Access
- 2 June 2022
- journal article
- research article
- Published by Springer Science and Business Media LLC in Scientific Reports
- Vol. 12 (1), 1-10
- https://doi.org/10.1038/s41598-022-13498-2
Abstract
Ozone is one of the most important air pollutants, with significant impacts on human health, regional air quality and ecosystems. In this study, we use geographic information and environmental information of the monitoring site of 5577 regions in the world from 2010 to 2014 as feature input to predict the long-term average ozone concentration of the site. A Bayesian optimization-based XGBoost-RFE feature selection model BO-XGBoost-RFE is proposed, and a variety of machine learning algorithms are used to predict ozone concentration based on the optimal feature subset. Since the selection of the underlying model hyperparameters is involved in the recursive feature selection process, different hyperparameter combinations will lead to differences in the feature subsets selected by the model, so that the feature subsets obtained by the model may not be optimal solutions. We combine the Bayesian optimization algorithm to adjust the parameters of recursive feature elimination based on XGBoost to obtain the optimal parameter combination and the optimal feature subset under the parameter combination. Experiments on long-term ozone concentration prediction on a global scale show that the prediction accuracy of the model after Bayesian optimized XGBoost-RFE feature selection is higher than that based on all features and on feature selection with Pearson correlation. Among the four prediction models, random forest obtained the highest prediction accuracy. The XGBoost prediction model achieved the greatest improvement in accuracy.This publication has 21 references indexed in Scilit:
- Tropospheric Ozone Assessment Report: Database and metrics data of global surface ozone observationsElementa: Science of the Anthropocene, 2017
- XGBoostPublished by Association for Computing Machinery (ACM) ,2016
- Handling high-dimensional data in air pollution forecasting tasksEcological Informatics, 2016
- Taking the Human Out of the Loop: A Review of Bayesian OptimizationProceedings of the IEEE, 2015
- Tropospheric ozone and its precursors from the urban to the global scale from air quality to short-lived climate forcerAtmospheric Chemistry and Physics, 2015
- Global crop yield reductions due to surface ozone exposure: 1. Year 2000 crop production losses and economic damageAtmospheric Environment, 2011
- Prediction of hourly O3 concentrations using support vector regression algorithmsAtmospheric Environment, 2010
- A Feature Selection Method for Air Quality ForecastingLecture Notes in Computer Science, 2010
- Gene Selection for Cancer Classification using Support Vector MachinesMachine Learning, 2002
- Atmospheric Chemistry of Tropospheric Ozone Formation: Scientific and Regulatory ImplicationsAir & Waste, 1993