A practical solution to pseudoreplication bias in single-cell studies

Open Access

2 February 2021

journal article
research article
Published by Springer Science and Business Media LLC in Nature Communications

Vol. 12 (1), 1-9
https://doi.org/10.1038/s41467-021-21038-1

Abstract

Cells from the same individual share common genetic and environmental backgrounds and are not statistically independent; therefore, they are subsamples or pseudoreplicates. Thus, single-cell data have a hierarchical structure that many current single-cell methods do not address, leading to biased inference, highly inflated type 1 error rates, and reduced robustness and reproducibility. This includes methods that use a batch effect correction for individual as a means of accounting for within-sample correlation. Here, we document this dependence across a range of cell types and show that pseudo-bulk aggregation methods are conservative and underpowered relative to mixed models. To compute differential expression within a specific cell type across treatment groups, we propose applying generalized linear mixed models with a random effect for individual, to properly account for both zero inflation and the correlation structure among measures from cells within an individual. Finally, we provide power estimates across a range of experimental conditions to assist researchers in designing appropriately powered studies.

Funding Information

U.S. Department of Health & Human Services | National Institutes of Health (U01 NS036695)
U.S. Department of Health & Human Services | NIH | National Cancer Institute (P30CA012197)

This publication has 41 references indexed in Scilit:

The problem of pseudoreplication in neuroscientific studies: is it affecting your analysis?
BMC Neuroscience, 2010
Generalized Estimating Equations
Methods of Information in Medicine, 2010
Pseudoreplication is (still) a problem.
Journal of Comparative Psychology, 2009
Analysis of Messy Data Volume 1: Designed Experiments, Second Edition by George A. Milliken, Dallas E. Johnson
International Statistical Review, 2009
Adjusting batch effects in microarray expression data using empirical Bayes methods
Biostatistics, 2006
Sufficient Sample Sizes for Multilevel Modeling
Methodology, 2005
Remedies for pseudoreplication
Fisheries Research, 2004
Consistent Estimators in Generalized Linear Mixed Models
Journal of the American Statistical Association, 1998
Pseudoreplication Revisited
Ecology, 1996
Pseudoreplication and the Design of Ecological Field Experiments
Ecological Monographs, 1984

Cited by 125 articles