A solution to minimum sample size for regressions

Top Cited Papers

Open Access

21 February 2020

journal article
research article
Published by Public Library of Science (PLoS) in PLOS ONE

Vol. 15 (2), e0229345
https://doi.org/10.1371/journal.pone.0229345

Abstract

Regressions and meta-regressions are widely used to estimate patterns and effect sizes in various disciplines. However, many biological and medical analyses use relatively low sample size (N), contributing to concerns on reproducibility. What is the minimum N to identify the most plausible data pattern using regressions? Statistical power analysis is often used to answer that question, but it has its own problems and logically should follow model selection to first identify the most plausible model. Here we make null, simple linear and quadratic data with different variances and effect sizes. We then sample and use information theoretic model selection to evaluate minimum N for regression models. We also evaluate the use of coefficient of determination (R²) for this purpose; it is widely used but not recommended. With very low variance, both false positives and false negatives occurred at N < 8, but data shape was always clearly identified at N ≥ 8. With high variance, accurate inference was stable at N ≥ 25. Those outcomes were consistent at different effect sizes. Akaike Information Criterion weights (AICc w_i) were essential to clearly identify patterns (e.g., simple linear vs. null); R² or adjusted R² values were not useful. We conclude that a minimum N = 8 is informative given very little variance, but minimum N ≥ 25 is required for more variance. Alternative models are better compared using information theory indices such as AIC but not R² or adjusted R². Insufficient N and R²-based model selection apparently contribute to confusion and low reproducibility in various disciplines. To avoid those problems, we recommend that research based on regressions or meta-regressions use N ≥ 25.

Keywords

This publication has 50 references indexed in Scilit:

META‐ANALYSIS OF ECONOMICS RESEARCH REPORTING GUIDELINES
Journal of Economic Surveys, 2013
The intermediate disturbance hypothesis should be abandoned
Trends in Ecology & Evolution, 2013
A basic introduction to fixed-effect and random-effects models for meta-analysis
Research Synthesis Methods, 2010
Understanding heterogeneity in meta-analysis: the role of meta-regression
International Journal of Clinical Practice, 2009
Multiplicative by nature: Why logarithmic transformation is necessary in allometry
Journal of Theoretical Biology, 2009
Why Most Published Research Findings Are False
PLoS Medicine, 2005
How should meta‐regression analyses be undertaken and interpreted?
Statistics in Medicine, 2002
The Insignificance of Null Hypothesis Significance Testing
Political Research Quarterly, 1999
How Many Subjects Does It Take To Do A Regression Analysis
Multivariate Behavioral Research, 1991
A Primer of Multivariate Statistics
Biometrics, 1975

Cited by 323 articles