A comparative investigation of methods for logistic regression with separated or nearly separated data
- 6 September 2006
- journal article
- research article
- Published by Wiley in Statistics in Medicine
- Vol. 25 (24), 4216-4226
- https://doi.org/10.1002/sim.2687
Abstract
In logistic regression analysis of small or sparse data sets, results obtained by classical maximum likelihood methods cannot be generally trusted. In such analyses it may even happen that the likelihood meets the convergence criteria while at least one parameter estimate diverges to ±∞. This situation has been termed ‘separation’, and it typically occurs whenever no events are observed in one of the two groups defined by a dichotomous covariate. More generally, separation is caused by a linear combination of continuous or dichotomous covariates that perfectly separates events from non‐events. Separation implies infinite or zero maximum likelihood estimates of odds ratios, which are usually considered unrealistic. I provide some examples of separation and near‐separation in clinical data sets and discuss some options to analyse such data, including exact logistic regression analysis and a penalized likelihood approach. Both methods supply finite point estimates in case of separation. Profile penalized likelihood confidence intervals for parameters show excellent behaviour in terms of coverage probability and provide higher power than exact confidence intervals. General advantages of the penalized likelihood approach are discussed. Copyright © 2006 John Wiley & Sons, Ltd.Keywords
This publication has 26 references indexed in Scilit:
- Engraftment Syndrome after Nonmyeloablative Allogeneic Hematopoietic Stem Cell Transplantation: Incidence and Effects on SurvivalTransplantation and Cellular Therapy, 2005
- High expression of lipoprotein lipase in poor risk B-cell chronic lymphocytic leukemiaLeukemia, 2005
- Medical conditions increasing the risk of chronic thromboembolic pulmonary hypertensionThrombosis and Haemostasis, 2005
- A permutation test for inference in logistic regression with small‐ and moderate‐sized data setsStatistics in Medicine, 2004
- High‐Flow Perfusion of Sheaths for Prevention of Thromboembolic Complications During Complex Catheter Ablation in the Left AtriumJournal of Cardiovascular Electrophysiology, 2004
- Efficient Monte Carlo Methods for Conditional Logistic RegressionJournal of the American Statistical Association, 2000
- Improving on exact tests by approximate conditioningBiometrika, 1999
- Bias Reduction using Stochastic ApproximationAustralian & New Zealand Journal of Statistics, 1998
- On bias reduction in exponential and non-exponential family regression modelsCommunications in Statistics - Simulation and Computation, 1998
- Computing Distributions for Exact Logistic RegressionJournal of the American Statistical Association, 1987