Using generalized additive models to reduce residual confounding

6 December 2004

journal article
research article
Published by Wiley in Statistics in Medicine

Vol. 23 (24), 3781-3801
https://doi.org/10.1002/sim.2073

Abstract

Traditionally, confounding by continuous variables is controlled by including a linear or categorical term in a regression model. Residual confounding occurs when the effect of the confounder on the outcome is mis‐modelled. A continuous representation of a covariate was previously shown to result in a less biased estimate of the adjusted exposure effect than categorization provided the functional form of the covariate–outcome relationship is correctly specified. However, this is rarely known. In contrast to parametric regression, generalized additive models (GAM) fit a smooth dose–response curve to the data, without requiring a priori knowledge of the functional form. We used simulations to compare parametric multiple logistic regression vs its non‐parametric GAM extension in their ability to control for a continuous confounder. We also investigated several issues related to the implementation of GAM in this context, including: (i) selecting the degrees of freedom; and (ii) alternative criteria for inclusion/exclusion of the potential confounder and for choosing between parametric and non‐parametric representation of its effect. The impact of the shape and strength of the confounder–disease association, sample size, and the correlation between the confounder and exposure were investigated. Simulations showed that when the confounder has a non‐linear association with the outcome, compared to a parametric representation, GAM modelling (i) reduced the mean squared error for the adjusted exposure effect; (ii) avoided inflation of the type I error for testing the exposure effect. When the true confounder–outcome relationship was linear, GAM performed as well as the parametric logistic regression. When modelling a continuous exposure non‐parametrically, in the presence of a continuous confounder, our results suggest that assuming a linear effect of the confounder and focussing on the non‐linearity of the exposure–outcome relationship leads to spurious findings of non‐linearity: joint non‐linear modelling is necessary. Overall, our results suggest that the use of GAM to reduce residual confounding offers several improvements over conventional parametric modelling. Copyright © 2004 John Wiley & Sons, Ltd.

Keywords

This publication has 29 references indexed in Scilit:

Decline in Learning Ability Best Predicts Future Dementia Type: The Freedom House Study
Experimental Aging Research, 2003
Bayesian Model Selection and Model Averaging
Journal of Mathematical Psychology, 2000
Key Concepts in Model Selection: Performance and Generalizability
Journal of Mathematical Psychology, 2000
An Introduction to Model Selection
Journal of Mathematical Psychology, 2000
Time-Dependent Hazard Ratio: Modeling and Hypothesis Testing with Application in Lupus Nephritis
Journal of the American Statistical Association, 1996
Weight Threshold and Blood Pressure in a Lean Black Population
Hypertension, 1995
Nonparametric smoothing in the analysis of air pollution and respiratory illness
The Canadian Journal of Statistics / La Revue Canadienne de Statistique, 1994
Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting
Journal of the American Statistical Association, 1988
Estimating the Dimension of a Model
The Annals of Statistics, 1978
A new look at the statistical model identification
IEEE Transactions on Automatic Control, 1974

Cited by 60 articles