Normalization of RNA-seq data using factor analysis of control genes or samples
Top Cited Papers
Open Access
- 24 August 2014
- journal article
- research article
- Published by Springer Science and Business Media LLC in Nature Biotechnology
- Vol. 32 (9), 896-902
- https://doi.org/10.1038/nbt.2931
Abstract
Remove unwanted variation (RUV) is a new statistical method for RNA-seq data normalization that uses control genes or samples to improve differential expression analysis. Normalization of RNA-sequencing (RNA-seq) data has proven essential to ensure accurate inference of expression levels. Here, we show that usual normalization approaches mostly account for sequencing depth and fail to correct for library preparation and other more complex unwanted technical effects. We evaluate the performance of the External RNA Control Consortium (ERCC) spike-in controls and investigate the possibility of using them directly for normalization. We show that the spike-ins are not reliable enough to be used in standard global-scaling or regression-based normalization procedures. We propose a normalization strategy, called remove unwanted variation (RUV), that adjusts for nuisance technical effects by performing factor analysis on suitable sets of control genes (e.g., ERCC spike-ins) or samples (e.g., replicate libraries). Our approach leads to more accurate estimates of expression fold-changes and tests of differential expression compared to state-of-the-art normalization methods. In particular, RUV promises to be valuable for large collaborative projects involving multiple laboratories, technicians, and/or sequencing platforms.Keywords
This publication has 39 references indexed in Scilit:
- Systematic comparison of RNA-Seq normalization methods using measurement error modelsBioinformatics, 2012
- Microbial environments confound antibiotic efficacyNature Chemical Biology, 2011
- Development and applications of single-cell transcriptome analysisNature Methods, 2011
- Correction for hidden confounders in the genetic analysis of gene expressionProceedings of the National Academy of Sciences of the United States of America, 2010
- edgeR: a Bioconductor package for differential expression analysis of digital gene expression dataBioinformatics, 2009
- Comprehensive genomic characterization defines human glioblastoma genes and core pathwaysNature, 2008
- Mapping and quantifying mammalian transcriptomes by RNA-SeqNature Methods, 2008
- Evaluation of DNA microarray results with quantitative gene expression platformsNature Biotechnology, 2006
- Locally Weighted Regression: An Approach to Regression Analysis by Local FittingJournal of the American Statistical Association, 1988
- Robust Locally Weighted Regression and Smoothing ScatterplotsJournal of the American Statistical Association, 1979