Bayesian negative binomial regression for differential expression with confounding factors
- 24 April 2018
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 34 (19), 3349-3356
- https://doi.org/10.1093/bioinformatics/bty330
Abstract
Rapid adoption of high-throughput sequencing technologies has enabled better understanding of genome-wide molecular profile changes associated with phenotypic differences in biomedical studies. Often, these changes are due to multiple interacting factors. Existing methods are mostly considering differential expression across two conditions studying one main factor without considering other confounding factors. In addition, they are often coupled with essential sophisticated ad-hoc pre-processing steps such as normalization, restricting their adaptability to general experimental setups. Complex multi-factor experimental design to accurately decipher genotype-phenotype relationships signifies the need for developing effective statistical tools for genome-scale sequencing data profiled under multi-factor conditions. We have developed a novel Bayesian negative binomial regression (BNB-R) method for the analysis of RNA sequencing (RNA-seq) count data. In particular, the natural model parameterization removes the needs for the normalization step, while the method is capable of tackling complex experimental design involving multi-variate dependence structures. Efficient Bayesian inference of model parameters is obtained by exploiting conditional conjugacy via novel data augmentation techniques. Comprehensive studies on both synthetic and real-world RNA-seq data demonstrate the superior performance of BNB-R in terms of the areas under both the receiver operating characteristic and precision-recall curves. BNB-R is implemented in R language and is available at https://github.com/siamakz/BNBR. Supplementary data are available at Bioinformatics online.Funding Information
- National Science Foundation (CCF-1553281)
- USDA NIFA (06-505570-01006)
This publication has 33 references indexed in Scilit:
- A comparison of methods for differential expression analysis of RNA-seq dataBMC Bioinformatics, 2013
- Comprehensive evaluation of differential gene expression analysis methods for RNA-seq dataGenome Biology, 2013
- The sva package for removing batch effects and other unwanted variation in high-throughput experimentsBioinformatics, 2012
- Finding consistent patterns: A nonparametric approach for identifying differential expression in RNA-Seq dataStatistical Methods in Medical Research, 2011
- Differential expression analysis for sequence count dataGenome Biology, 2010
- Adipose Tissue Collagen VI in ObesityJournal of Clinical Endocrinology & Metabolism, 2009
- edgeR: a Bioconductor package for differential expression analysis of digital gene expression dataBioinformatics, 2009
- RNA-Seq: a revolutionary tool for transcriptomicsNature Reviews Genetics, 2009
- The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurementsNature Biotechnology, 2006
- Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray ExperimentsStatistical Applications in Genetics and Molecular Biology, 2004