Global goodness‐of‐fit tests in logistic regression with sparse data

2 December 2002

journal article
conference paper
Published by Wiley in Statistics in Medicine

Vol. 21 (24), 3789-3801
https://doi.org/10.1002/sim.1421

Abstract

The logistic regression model has become the standard analysing tool for binary responses in medical statistics. Methods for assessing goodness‐of‐fit, however, are less developed where this problem is especially pronounced in performing global goodness‐of‐fit tests with sparse data, that is, if the data contain only a small numbers of observations for each pattern of covariate values. In this situation it has been known for a long time that the standard goodness‐of‐fit tests (residual deviance and Pearson chi‐square) behave unsatisfactorily if p‐values are calculated from the χ²‐distribution. As a remedy in this situation the Hosmer–Lemeshow test is frequently recommended; it relies on a new grouping of the observations to avoid sparseness, where this grouping depends on the estimated probabilities from the model. It has been shown, however, that the Hosmer–Lemeshow test also has some deficiencies, for example, it depends heavily on the calculating algorithm and thus different implementations might lead to different conclusions regarding the fit of the model. We present some alternative tests from the statistical literature which should also perform well with sparse data. Results from a simulation study are given which show that there exist some goodness‐of‐fit tests (for example, the Farrington test) that have good properties regarding size and power and that even outperform the Hosmer–Lemeshow test. We illustrate the various tests with an example from dermatology on occupational hand eczema in hairdressers. Copyright © 2002 John Wiley & Sons, Ltd.

Keywords

This publication has 23 references indexed in Scilit:

Inzidenz berufsbedingter Hautkrankheiten in hautgefährdenden Berufsordnungsgruppen
Die Dermatologie, 2001
The epidemiology of occupational contact dermatitis
Internationales Archiv für Arbeitsmedizin, 1999
Development and validation of diagnostic scores for atopic dermatitis incorporating criteria of data quality and practical usefulness
Journal of Clinical Epidemiology, 1996
Normal Goodness-of-Fit Tests for Multinomial Models with Large Degrees of Freedom
Journal of the American Statistical Association, 1992
Unstable Models from Incorrect Forms
American Journal of Agricultural Economics, 1991
The Statistical Analysis of Discrete Data
Published by Springer Science and Business Media LLC ,1989
Generalized Linear Models
Published by Springer Science and Business Media LLC ,1989
On the Asymptotic Distribution of Pearson's Statistic in Linear Exponential-Family Models
International Statistical Review, 1985
Logistic Regression Diagnostics
The Annals of Statistics, 1981
Goodness of fit tests for the multiple logistic regression model
Communications in Statistics - Theory and Methods, 1980

Cited by 53 articles