baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data
Open Access
- 10 August 2010
- journal article
- research article
- Published by Springer Science and Business Media LLC in BMC Bioinformatics
- Vol. 11 (1), 1-14
- https://doi.org/10.1186/1471-2105-11-422
Abstract
High throughput sequencing has become an important technology for studying expression levels in many types of genomic, and particularly transcriptomic, data. One key way of analysing such data is to look for elements of the data which display particular patterns of differential expression in order to take these forward for further analysis and validation. We propose a framework for defining patterns of differential expression and develop a novel algorithm, baySeq, which uses an empirical Bayes approach to detect these patterns of differential expression within a set of sequencing samples. The method assumes a negative binomial distribution for the data and derives an empirically determined prior distribution from the entire dataset. We examine the performance of the method on real and simulated data. Our method performs at least as well, and often better, than existing methods for analyses of pairwise differential expression in both real and simulated data. When we compare methods for the analysis of data from experimental designs involving multiple sample groups, our method again shows substantial gains in performance. We believe that this approach thus represents an important step forward for the analysis of count data from sequencing experiments.Keywords
This publication has 22 references indexed in Scilit:
- Differential expression analysis for sequence count dataGenome Biology, 2010
- Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experimentsBMC Bioinformatics, 2010
- edgeR: a Bioconductor package for differential expression analysis of digital gene expression dataBioinformatics, 2009
- DEGseq: an R package for identifying differentially expressed genes from RNA-seq dataBioinformatics, 2009
- PatMaN: rapid alignment of short sequences to large databasesBioinformatics, 2008
- The Arabidopsis Information Resource (TAIR): gene structure and function annotationNucleic Acids Research, 2007
- Empirical Bayes Microarray ANOVA and Grouping Cell Lines by Equal Expression LevelsStatistical Applications in Genetics and Molecular Biology, 2005
- Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray ExperimentsStatistical Applications in Genetics and Molecular Biology, 2004
- Quasi-likelihood and pseudo-likelihood are not the same thingJournal of Applied Statistics, 2000
- Methods for Approximating Integrals in Statistics with Special Emphasis on Bayesian Integration ProblemsStatistical Science, 1995