Removal of batch effects using distribution-matching residual networks

Open Access

13 April 2017

journal article
research article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 33 (16), 2539-2546
https://doi.org/10.1093/bioinformatics/btx196

Abstract

Sources of variability in experimentally derived data include measurement error in addition to the physical phenomena of interest. This measurement error is a combination of systematic components, originating from the measuring instrument and random measurement errors. Several novel biological technologies, such as mass cytometry and single-cell RNA-seq (scRNA-seq), are plagued with systematic errors that may severely affect statistical analysis if the data are not properly calibrated. We propose a novel deep learning approach for removing systematic batch effects. Our method is based on a residual neural network, trained to minimize the Maximum Mean Discrepancy between the multivariate distributions of two replicates, measured in different batches. We apply our method to mass cytometry and scRNA-seq datasets, and demonstrate that it effectively attenuates batch effects. our codes and data are publicly available at https://github.com/ushaham/BatchEffectRemoval.git Supplementary data are available at Bioinformatics online.

Other Versions

Funding Information

NIH (1R01HG008383-01A1)

This publication has 14 references indexed in Scilit:

Methods that remove batch effects while retaining group differences may lead to exaggerated confidence in downstream analyses
Biostatistics, 2015
Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets
Cell, 2015
Massively Parallel Single-Cell RNA-Seq for Marker-Free Decomposition of Tissues into Cell Types
Science, 2014
Smart-seq2 for sensitive full-length transcriptome profiling in single cells
Nature Methods, 2013
Normalization of mass cytometry data with bead standards
Cytometry Part A, 2013
The sva package for removing batch effects and other unwanted variation in high-throughput experiments
Bioinformatics, 2012
Tackling the widespread and critical impact of batch effects in high-throughput data
Nature Reviews Genetics, 2010
Per‐channel basis normalization methods for flow cytometry data
Cytometry Part A, 2009
Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis
PLoS Genetics, 2007
Adjusting batch effects in microarray expression data using empirical Bayes methods
Biostatistics, 2006

Cited by 120 articles