Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions
Open Access
- 27 February 2017
- journal article
- research article
- Published by Oxford University Press (OUP) in Briefings in Bioinformatics
- Vol. 19 (5), 776-792
- https://doi.org/10.1093/bib/bbx008
Abstract
RNA-Seq is a widely used method for studying the behavior of genes under different biological conditions. An essential step in an RNA-Seq study is normalization, in which raw data are adjusted to account for factors that prevent direct comparison of expression measures. Errors in normalization can have a significant impact on downstream analysis, such as inflated false positives in differential expression analysis. An underemphasized feature of normalization is the assumptions on which the methods rely and how the validity of these assumptions can have a substantial impact on the performance of the methods. In this article, we explain how assumptions provide the link between raw RNA-Seq read counts and meaningful measures of gene expression. We examine normalization methods from the perspective of their assumptions, as an understanding of methodological assumptions is necessary for choosing methods appropriate for the data at hand. Furthermore, we discuss why normalization methods perform poorly when their assumptions are violated and how this causes problems in subsequent analysis. To analyze a biological experiment, researchers must select a normalization method with assumptions that are met and that produces a meaningful measure of expression for the given experiment.Keywords
Funding Information
- Pomona College (52007555)
- Harvey Mudd College (52007544)
- Howard Hughes Medical Institute
This publication has 41 references indexed in Scilit:
- Systematic comparison of RNA-Seq normalization methods using measurement error modelsBioinformatics, 2012
- Differential expression--the next generation and beyondBriefings in Functional Genomics, 2011
- Normalization, testing, and false discovery rate estimation for RNA-sequencing dataBiostatistics, 2011
- Synthetic spike-in standards for RNA-seq experimentsGenome Research, 2011
- Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiationNature Biotechnology, 2010
- Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experimentsBMC Bioinformatics, 2010
- edgeR: a Bioconductor package for differential expression analysis of digital gene expression dataBioinformatics, 2009
- RNA-Seq: a revolutionary tool for transcriptomicsNature Reviews Genetics, 2009
- Mapping and quantifying mammalian transcriptomes by RNA-SeqNature Methods, 2008
- Control Genes and Variability: Absence of Ubiquitous Reference Transcripts in Diverse Mammalian Expression StudiesGenome Research, 2002