Using Supervised Principal Components Analysis to Assess Multiple Pollutant Effects

Abstract
Many investigations of the adverse health effects of multiple air pollutants analyze the time series involved by simultaneously entering the multiple pollutants into a Poisson log-linear model. This method can yield unstable parameter estimates when the pollutants involved suffer high intercorrelation; therefore, traditional approaches to dealing with multicollinearity, such as principal component analysis (PCA), have been promoted in this context. A characteristic of PCA is that its construction does not consider the relationship between the covariates and the adverse health outcomes. A refined version of PCA, supervised principal components analysis (SPCA), is proposed that specifically addresses this issue. Models controlling for long-term trends and weather effects were used in conjunction with each SPCA and PCA to estimate the association between multiple air pollutants and mortality for U.S. cities. The methods were compared further via a simulation study. Simulation studies demonstrated that SPCA, unlike PCA, was successful in identifying the correct subset of multiple pollutants associated with mortality. Because of this property, SPCA and PCA returned different estimates for the relationship between air pollution and mortality. Although a number of methods for assessing the effects of multiple pollutants have been proposed, such methods can falter in the presence of high correlation among pollutants. Both PCA and SPCA address this issue. By allowing the exclusion of pollutants that are not associated with the adverse health outcomes from the mixture of pollutants selected, SPCA offers a critical improvement over PCA.