MCMSeq: Bayesian hierarchical modeling of clustered and repeated measures RNA sequencing experiments
Open Access
- 28 August 2020
- journal article
- research article
- Published by Springer Science and Business Media LLC in BMC Bioinformatics
- Vol. 21 (1), 1-20
- https://doi.org/10.1186/s12859-020-03715-y
Abstract
As the barriers to incorporating RNA sequencing (RNA-Seq) into biomedical studies continue to decrease, the complexity and size of RNA-Seq experiments are rapidly growing. Paired, longitudinal, and other correlated designs are becoming commonplace, and these studies offer immense potential for understanding how transcriptional changes within an individual over time differ depending on treatment or environmental conditions. While several methods have been proposed for dealing with repeated measures within RNA-Seq analyses, they are either restricted to handling only paired measurements, can only test for differences between two groups, and/or have issues with maintaining nominal false positive and false discovery rates. In this work, we propose a Bayesian hierarchical negative binomial generalized linear mixed model framework that can flexibly model RNA-Seq counts from studies with arbitrarily many repeated observations, can include covariates, and also maintains nominal false positive and false discovery rates in its posterior inference. In simulation studies, we showed that our proposed method (MCMSeq) best combines high statistical power (i.e. sensitivity or recall) with maintenance of nominal false positive and false discovery rates compared the other available strategies, especially at the smaller sample sizes investigated. This behavior was then replicated in an application to real RNA-Seq data where MCMSeq was able to find previously reported genes associated with tuberculosis infection in a cohort with longitudinal measurements. Failing to account for repeated measurements when analyzing RNA-Seq experiments can result in significantly inflated false positive and false discovery rates. Of the methods we investigated, whether they model RNA-Seq counts directly or worked on transformed values, the Bayesian hierarchical model implemented in the mcmseq R package (available at https://github.com/stop-pre16/mcmseq) best combined sensitivity and nominal error rate control.Keywords
This publication has 46 references indexed in Scilit:
- Detection of Tuberculosis in HIV-Infected and -Uninfected African Adults Using Whole Blood RNA Expression Signatures: A Case-Control StudyPLoS Medicine, 2013
- A Helicopter Perspective on TB Biomarkers: Pathway and Process Based Analysis of Gene Expression Data Provides New Insight into TB PathogenesisPLOS ONE, 2013
- A comparison of methods for differential expression analysis of RNA-seq dataBMC Bioinformatics, 2013
- AD Model Builder: using automatic differentiation for statistical inference of highly parameterized complex nonlinear modelsOptimization Methods and Software, 2012
- Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variationNucleic Acids Research, 2012
- edgeR: a Bioconductor package for differential expression analysis of digital gene expression dataBioinformatics, 2009
- The BUGS project: Evolution, critique and future directionsStatistics in Medicine, 2009
- Bayesian Analysis of Mass Spectrometry Proteomic Data Using Wavelet‐Based Functional Mixed ModelsBiometrics, 2008
- Reconciling Bayesian and Frequentist Evidence in the One-Sided Testing ProblemJournal of the American Statistical Association, 1987
- Sampling Theory of the Negative Binomial and Logarithmic Series DistributionsBiometrika, 1950