Predicting Fine Particulate Matter (PM2.5) in the Greater London Area: An Ensemble Approach using Machine Learning Methods
Open Access
- 11 March 2020
- journal article
- research article
- Published by MDPI AG in Remote Sensing
- Vol. 12 (6), 914
- https://doi.org/10.3390/rs12060914
Abstract
Estimating air pollution exposure has long been a challenge for environmental health researchers. Technological advances and novel machine learning methods have allowed us to increase the geographic range and accuracy of exposure models, making them a valuable tool in conducting health studies and identifying hotspots of pollution. Here, we have created a prediction model for daily PM2.5 levels in the Greater London area from 1st January 2005 to 31st December 2013 using an ensemble machine learning approach incorporating satellite aerosol optical depth (AOD), land use, and meteorological data. The predictions were made on a 1 km × 1 km scale over 3960 grid cells. The ensemble included predictions from three different machine learners: a random forest (RF), a gradient boosting machine (GBM), and a k-nearest neighbor (KNN) approach. Our ensemble model performed very well, with a ten-fold cross-validated R2 of 0.828. Of the three machine learners, the random forest outperformed the GBM and KNN. Our model was particularly adept at predicting day-to-day changes in PM2.5 levels with an out-of-sample temporal R2 of 0.882. However, its ability to predict spatial variability was weaker, with a R2 of 0.396. We believe this to be due to the smaller spatial variation in pollutant levels in this area.Funding Information
- Medical Research Council (MR/N014464/1.)
This publication has 53 references indexed in Scilit:
- Assessing temporally and spatially resolved PM2.5 exposures for epidemiological studies using satellite aerosol optical depth measurementsAtmospheric Environment, 2011
- Global Estimates of Ambient Fine Particulate Matter Concentrations from Satellite-Based Aerosol Optical Depth: Development and ApplicationEnvironmental Health Perspectives, 2010
- A land use regression model for predicting ambient fine particulate matter across Los Angeles, CAJournal of Environmental Monitoring, 2007
- Super LearnerStatistical Applications in Genetics and Molecular Biology, 2007
- Satellite remote sensing of particulate matter and air quality assessment over global citiesAtmospheric Environment, 2006
- The Effects of Air Pollution on Hospitalizations for Cardiovascular Diseasein Elderly People in Australian and New Zealand CitiesEnvironmental Health Perspectives, 2006
- Exploring parameter sensitivities of the land surface using a locally coupled land‐atmosphere modelPublished by American Geophysical Union (AGU) ,2004
- Intercomparison between satellite‐derived aerosol optical thickness and PM2.5 mass: Implications for air quality studiesGeophysical Research Letters, 2003
- Greedy function approximation: A gradient boosting machine.The Annals of Statistics, 2001
- An Association between Air Pollution and Mortality in Six U.S. CitiesThe New England Journal of Medicine, 1993