Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors
Top Cited Papers
- 2 April 2018
- journal article
- research article
- Published by Springer Science and Business Media LLC in Nature Biotechnology
- Vol. 36 (5), 421-427
- https://doi.org/10.1038/nbt.4091
Abstract
Differences in gene expression between individual cells of the same type are measured across batches and used to correct technical artifacts in single-cell RNA-sequencing data. Large-scale single-cell RNA sequencing (scRNA-seq) data sets that are produced in different laboratories and at different times contain batch effects that may compromise the integration and interpretation of the data. Existing scRNA-seq analysis methods incorrectly assume that the composition of cell populations is either known or identical across batches. We present a strategy for batch correction based on the detection of mutual nearest neighbors (MNNs) in the high-dimensional expression space. Our approach does not rely on predefined or equal population compositions across batches; instead, it requires only that a subset of the population be shared between batches. We demonstrate the superiority of our approach compared with existing methods by using both simulated and real scRNA-seq data sets. Using multiple droplet-based scRNA-seq data sets, we demonstrate that our MNN batch-effect-correction method can be scaled to large numbers of cells.This publication has 30 references indexed in Scilit:
- svaseq: removing batch effects and other unwanted noise from sequencing dataNucleic Acids Research, 2014
- Normalization of RNA-seq data using factor analysis of control genes or samplesNature Biotechnology, 2014
- Single-Cell Trajectory Detection Uncovers Progression and Regulatory Coordination in Human B Cell DevelopmentCell, 2014
- Massively Parallel Single-Cell RNA-Seq for Marker-Free Decomposition of Tissues into Cell TypesScience, 2014
- featureCounts: an efficient general purpose program for assigning sequence reads to genomic featuresBioinformatics, 2013
- Accounting for technical noise in single-cell RNA-seq experimentsNature Methods, 2013
- Smart-seq2 for sensitive full-length transcriptome profiling in single cellsNature Methods, 2013
- Quantifying Disorder through Conditional Entropy: An Application to Fluid MixingPLOS ONE, 2013
- STAR: ultrafast universal RNA-seq alignerBioinformatics, 2012
- Adjusting batch effects in microarray expression data using empirical Bayes methodsBiostatistics, 2006