Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions

Open Access

27 February 2017

journal article
research article
Published by Oxford University Press (OUP) in Briefings in Bioinformatics

Vol. 19 (5), 776-792
https://doi.org/10.1093/bib/bbx008

Abstract

RNA-Seq is a widely used method for studying the behavior of genes under different biological conditions. An essential step in an RNA-Seq study is normalization, in which raw data are adjusted to account for factors that prevent direct comparison of expression measures. Errors in normalization can have a significant impact on downstream analysis, such as inflated false positives in differential expression analysis. An underemphasized feature of normalization is the assumptions on which the methods rely and how the validity of these assumptions can have a substantial impact on the performance of the methods. In this article, we explain how assumptions provide the link between raw RNA-Seq read counts and meaningful measures of gene expression. We examine normalization methods from the perspective of their assumptions, as an understanding of methodological assumptions is necessary for choosing methods appropriate for the data at hand. Furthermore, we discuss why normalization methods perform poorly when their assumptions are violated and how this causes problems in subsequent analysis. To analyze a biological experiment, researchers must select a normalization method with assumptions that are met and that produces a meaningful measure of expression for the given experiment.

Keywords

Funding Information

Pomona College (52007555)
Harvey Mudd College (52007544)
Howard Hughes Medical Institute

This publication has 41 references indexed in Scilit:

Systematic comparison of RNA-Seq normalization methods using measurement error models
Bioinformatics, 2012
Differential expression--the next generation and beyond
Briefings in Functional Genomics, 2011
Normalization, testing, and false discovery rate estimation for RNA-sequencing data
Biostatistics, 2011
Synthetic spike-in standards for RNA-seq experiments
Genome Research, 2011
Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation
Nature Biotechnology, 2010
Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments
BMC Bioinformatics, 2010
edgeR: a Bioconductor package for differential expression analysis of digital gene expression data
Bioinformatics, 2009
RNA-Seq: a revolutionary tool for transcriptomics
Nature Reviews Genetics, 2009
Mapping and quantifying mammalian transcriptomes by RNA-Seq
Nature Methods, 2008
Control Genes and Variability: Absence of Ubiquitous Reference Transcripts in Diverse Mammalian Expression Studies
Genome Research, 2002

Cited by 212 articles