Global goodness‐of‐fit tests in logistic regression with sparse data
- 2 December 2002
- journal article
- conference paper
- Published by Wiley in Statistics in Medicine
- Vol. 21 (24), 3789-3801
- https://doi.org/10.1002/sim.1421
Abstract
The logistic regression model has become the standard analysing tool for binary responses in medical statistics. Methods for assessing goodness‐of‐fit, however, are less developed where this problem is especially pronounced in performing global goodness‐of‐fit tests with sparse data, that is, if the data contain only a small numbers of observations for each pattern of covariate values. In this situation it has been known for a long time that the standard goodness‐of‐fit tests (residual deviance and Pearson chi‐square) behave unsatisfactorily if p‐values are calculated from the χ2‐distribution. As a remedy in this situation the Hosmer–Lemeshow test is frequently recommended; it relies on a new grouping of the observations to avoid sparseness, where this grouping depends on the estimated probabilities from the model. It has been shown, however, that the Hosmer–Lemeshow test also has some deficiencies, for example, it depends heavily on the calculating algorithm and thus different implementations might lead to different conclusions regarding the fit of the model. We present some alternative tests from the statistical literature which should also perform well with sparse data. Results from a simulation study are given which show that there exist some goodness‐of‐fit tests (for example, the Farrington test) that have good properties regarding size and power and that even outperform the Hosmer–Lemeshow test. We illustrate the various tests with an example from dermatology on occupational hand eczema in hairdressers. Copyright © 2002 John Wiley & Sons, Ltd.Keywords
This publication has 23 references indexed in Scilit:
- Inzidenz berufsbedingter Hautkrankheiten in hautgefährdenden BerufsordnungsgruppenDie Dermatologie, 2001
- The epidemiology of occupational contact dermatitisInternationales Archiv für Arbeitsmedizin, 1999
- Development and validation of diagnostic scores for atopic dermatitis incorporating criteria of data quality and practical usefulnessJournal of Clinical Epidemiology, 1996
- Normal Goodness-of-Fit Tests for Multinomial Models with Large Degrees of FreedomJournal of the American Statistical Association, 1992
- Unstable Models from Incorrect FormsAmerican Journal of Agricultural Economics, 1991
- The Statistical Analysis of Discrete DataPublished by Springer Science and Business Media LLC ,1989
- Generalized Linear ModelsPublished by Springer Science and Business Media LLC ,1989
- On the Asymptotic Distribution of Pearson's Statistic in Linear Exponential-Family ModelsInternational Statistical Review, 1985
- Logistic Regression DiagnosticsThe Annals of Statistics, 1981
- Goodness of fit tests for the multiple logistic regression modelCommunications in Statistics - Theory and Methods, 1980