Framework for the Rapid Optimization of Soluble Protein Expression in Escherichia coli Combining Microscale Experiments and Statistical Experimental Design

Abstract
A major bottleneck in drug discovery is the production of soluble human recombinant protein in sufficient quantities for analysis. This problem is compounded by the complex relationship between protein yield and the large number of variables which affect it. Here, we describe a generic framework for the rapid identification and optimization of factors affecting soluble protein yield in microwell plate fermentations as a prelude to the predictive and reliable scaleup of optimized culture conditions. Recombinant expression of firefly luciferase in Escherichia coli was used as a model system. Two rounds of statistical design of experiments (DoE) were employed to first screen (D-optimal design) and then optimize (central composite face design) the yield of soluble protein. Biological variables from the initial screening experiments included medium type and growth and induction conditions. To provide insight into the impact of the engineering environment on cell growth and expression, plate geometry, shaking speed, and liquid fill volume were included as factors since these strongly influence oxygen transfer into the wells. Compared to standard reference conditions, both the screening and optimization designs gave up to 3-fold increases in the soluble protein yield, i.e., a 9-fold increase overall. In general the highest protein yields were obtained when cells were induced at a relatively low biomass concentration and then allowed to grow slowly up to a high final biomass concentration, >8 g.L-1. Consideration and analysis of the model results showed 6 of the original 10 variables to be important at the screening stage and 3 after optimization. The latter included the microwell plate shaking speeds pre- and postinduction, indicating the importance of oxygen transfer into the microwells and identifying this as a critical parameter for subsequent scale translation studies. The optimization process, also known as response surface methodology (RSM), predicted there to be a distinct optimum set of conditions for protein expression which could be verified experimentally. This work provides a generic approach to protein expression optimization in which both biological and engineering variables are investigated from the initial screening stage. The application of DoE reduces the total number of experiments needed to be performed, while experimentation at the microwell scale increases experimental throughput and reduces cost.