Removal of batch effects using distribution-matching residual networks
Open Access
- 13 April 2017
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 33 (16), 2539-2546
- https://doi.org/10.1093/bioinformatics/btx196
Abstract
Sources of variability in experimentally derived data include measurement error in addition to the physical phenomena of interest. This measurement error is a combination of systematic components, originating from the measuring instrument and random measurement errors. Several novel biological technologies, such as mass cytometry and single-cell RNA-seq (scRNA-seq), are plagued with systematic errors that may severely affect statistical analysis if the data are not properly calibrated. We propose a novel deep learning approach for removing systematic batch effects. Our method is based on a residual neural network, trained to minimize the Maximum Mean Discrepancy between the multivariate distributions of two replicates, measured in different batches. We apply our method to mass cytometry and scRNA-seq datasets, and demonstrate that it effectively attenuates batch effects. our codes and data are publicly available at https://github.com/ushaham/BatchEffectRemoval.git Supplementary data are available at Bioinformatics online.Other Versions
Funding Information
- NIH (1R01HG008383-01A1)
This publication has 14 references indexed in Scilit:
- Methods that remove batch effects while retaining group differences may lead to exaggerated confidence in downstream analysesBiostatistics, 2015
- Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter DropletsCell, 2015
- Massively Parallel Single-Cell RNA-Seq for Marker-Free Decomposition of Tissues into Cell TypesScience, 2014
- Smart-seq2 for sensitive full-length transcriptome profiling in single cellsNature Methods, 2013
- Normalization of mass cytometry data with bead standardsCytometry Part A, 2013
- The sva package for removing batch effects and other unwanted variation in high-throughput experimentsBioinformatics, 2012
- Tackling the widespread and critical impact of batch effects in high-throughput dataNature Reviews Genetics, 2010
- Per‐channel basis normalization methods for flow cytometry dataCytometry Part A, 2009
- Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable AnalysisPLoS Genetics, 2007
- Adjusting batch effects in microarray expression data using empirical Bayes methodsBiostatistics, 2006