Analytical distributions for stochastic gene expression

Abstract
Gene expression is significantly stochastic making modeling of genetic networks challenging. We present an approximation that allows the calculation of not only the mean and variance, but also the distribution of protein numbers. We assume that proteins decay substantially more slowly than their mRNA and confirm that many genes satisfy this relation by using high-throughput data from budding yeast. For a two-stage model of gene expression, with transcription and translation as first-order reactions, we calculate the protein distribution for all times greater than several mRNA lifetimes and thus qualitatively predict the distribution of times for protein levels to first cross an arbitrary threshold. If in addition the fluctuates between inactive and active states, we can find the steady-state protein distribution, which can be bimodal if fluctuations of the promoter are slow. We show that our assumptions imply that protein synthesis occurs in geometrically distributed bursts and allows mRNA to be eliminated from a master equation description. In general, we find that protein distributions are asymmetric and may be poorly characterized by their mean and variance. Through maximum likelihood methods, our expressions should therefore allow more quantitative comparisons with experimental data. More generally, we introduce a technique to derive a simpler, effective dynamics for a stochastic system by eliminating a fast variable.