A practical solution to pseudoreplication bias in single-cell studies
Open Access
- 2 February 2021
- journal article
- research article
- Published by Springer Science and Business Media LLC in Nature Communications
- Vol. 12 (1), 1-9
- https://doi.org/10.1038/s41467-021-21038-1
Abstract
Cells from the same individual share common genetic and environmental backgrounds and are not statistically independent; therefore, they are subsamples or pseudoreplicates. Thus, single-cell data have a hierarchical structure that many current single-cell methods do not address, leading to biased inference, highly inflated type 1 error rates, and reduced robustness and reproducibility. This includes methods that use a batch effect correction for individual as a means of accounting for within-sample correlation. Here, we document this dependence across a range of cell types and show that pseudo-bulk aggregation methods are conservative and underpowered relative to mixed models. To compute differential expression within a specific cell type across treatment groups, we propose applying generalized linear mixed models with a random effect for individual, to properly account for both zero inflation and the correlation structure among measures from cells within an individual. Finally, we provide power estimates across a range of experimental conditions to assist researchers in designing appropriately powered studies.Funding Information
- U.S. Department of Health & Human Services | National Institutes of Health (U01 NS036695)
- U.S. Department of Health & Human Services | NIH | National Cancer Institute (P30CA012197)
This publication has 41 references indexed in Scilit:
- The problem of pseudoreplication in neuroscientific studies: is it affecting your analysis?BMC Neuroscience, 2010
- Generalized Estimating EquationsMethods of Information in Medicine, 2010
- Pseudoreplication is (still) a problem.Journal of Comparative Psychology, 2009
- Analysis of Messy Data Volume 1: Designed Experiments, Second Edition by George A. Milliken, Dallas E. JohnsonInternational Statistical Review, 2009
- Adjusting batch effects in microarray expression data using empirical Bayes methodsBiostatistics, 2006
- Sufficient Sample Sizes for Multilevel ModelingMethodology, 2005
- Remedies for pseudoreplicationFisheries Research, 2004
- Consistent Estimators in Generalized Linear Mixed ModelsJournal of the American Statistical Association, 1998
- Pseudoreplication RevisitedEcology, 1996
- Pseudoreplication and the Design of Ecological Field ExperimentsEcological Monographs, 1984