Predicting Fine Particulate Matter (PM2.5) in the Greater London Area: An Ensemble Approach using Machine Learning Methods

Open Access

11 March 2020

journal article
research article
Published by MDPI AG in Remote Sensing

Vol. 12 (6), 914
https://doi.org/10.3390/rs12060914

Abstract

Estimating air pollution exposure has long been a challenge for environmental health researchers. Technological advances and novel machine learning methods have allowed us to increase the geographic range and accuracy of exposure models, making them a valuable tool in conducting health studies and identifying hotspots of pollution. Here, we have created a prediction model for daily PM_2.5 levels in the Greater London area from 1st January 2005 to 31st December 2013 using an ensemble machine learning approach incorporating satellite aerosol optical depth (AOD), land use, and meteorological data. The predictions were made on a 1 km × 1 km scale over 3960 grid cells. The ensemble included predictions from three different machine learners: a random forest (RF), a gradient boosting machine (GBM), and a k-nearest neighbor (KNN) approach. Our ensemble model performed very well, with a ten-fold cross-validated R² of 0.828. Of the three machine learners, the random forest outperformed the GBM and KNN. Our model was particularly adept at predicting day-to-day changes in PM_2.5 levels with an out-of-sample temporal R² of 0.882. However, its ability to predict spatial variability was weaker, with a R² of 0.396. We believe this to be due to the smaller spatial variation in pollutant levels in this area.

Funding Information

Medical Research Council (MR/N014464/1.)

This publication has 53 references indexed in Scilit:

Assessing temporally and spatially resolved PM2.5 exposures for epidemiological studies using satellite aerosol optical depth measurements
Atmospheric Environment, 2011
Global Estimates of Ambient Fine Particulate Matter Concentrations from Satellite-Based Aerosol Optical Depth: Development and Application
Environmental Health Perspectives, 2010
A land use regression model for predicting ambient fine particulate matter across Los Angeles, CA
Journal of Environmental Monitoring, 2007
Super Learner
Statistical Applications in Genetics and Molecular Biology, 2007
Satellite remote sensing of particulate matter and air quality assessment over global cities
Atmospheric Environment, 2006
The Effects of Air Pollution on Hospitalizations for Cardiovascular Diseasein Elderly People in Australian and New Zealand Cities
Environmental Health Perspectives, 2006
Exploring parameter sensitivities of the land surface using a locally coupled land‐atmosphere model
Published by American Geophysical Union (AGU) ,2004
Intercomparison between satellite‐derived aerosol optical thickness and PM_2.5 mass: Implications for air quality studies
Geophysical Research Letters, 2003
Greedy function approximation: A gradient boosting machine.
The Annals of Statistics, 2001
An Association between Air Pollution and Mortality in Six U.S. Cities
The New England Journal of Medicine, 1993

Cited by 77 articles