Removing Batch Effects in Analysis of Expression Microarray Data: An Evaluation of Six Batch Adjustment Methods
Open Access
- 28 February 2011
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLOS ONE
- Vol. 6 (2), e17238
- https://doi.org/10.1371/journal.pone.0017238
Abstract
The expression microarray is a frequently used approach to study gene expression on a genome-wide scale. However, the data produced by the thousands of microarray studies published annually are confounded by “batch effects,” the systematic error introduced when samples are processed in multiple batches. Although batch effects can be reduced by careful experimental design, they cannot be eliminated unless the whole study is done in a single batch. A number of programs are now available to adjust microarray data for batch effects prior to analysis. We systematically evaluated six of these programs using multiple measures of precision, accuracy and overall performance. ComBat, an Empirical Bayes method, outperformed the other five programs by most metrics. We also showed that it is essential to standardize expression data at the probe level when testing for correlation of expression profiles, due to a sizeable probe effect in microarray data that can inflate the correlation among replicates and unrelated samples.Keywords
This publication has 33 references indexed in Scilit:
- A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression dataThe Pharmacogenomics Journal, 2010
- The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive modelsNature Biotechnology, 2010
- A multilevel model to address batch effects in copy number estimation using SNP arraysBiostatistics, 2010
- Genetic Control of Individual Differences in Gene-Specific Methylation in Human BrainAmerican Journal of Human Genetics, 2010
- Correcting for intra-experiment variation in Illumina BeadChip data is necessary to generate robust gene-expression profilesBMC Genomics, 2010
- The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets – improving meta-analysis and prediction of prognosisBMC Medical Genomics, 2008
- Consolidated strategy for the analysis of microarray spike-in dataNucleic Acids Research, 2008
- Sources of variation in baseline gene expression levels from toxicogenomics study control animals across multiple laboratoriesBMC Genomics, 2008
- Gene Expression Omnibus: NCBI gene expression and hybridization array data repositoryNucleic Acids Research, 2002
- Exploring the new world of the genome with DNA microarraysNature Genetics, 1999