An Automated Phenotype-Driven Approach (GeneForce) for Refining Metabolic and Regulatory Models

Abstract
Integrated constraint-based metabolic and regulatory models can accurately predict cellular growth phenotypes arising from genetic and environmental perturbations. Challenges in constructing such models involve the limited availability of information about transcription factor—gene target interactions and computational methods to quickly refine models based on additional datasets. In this study, we developed an algorithm, GeneForce, to identify incorrect regulatory rules and gene-protein-reaction associations in integrated metabolic and regulatory models. We applied the algorithm to refine integrated models of Escherichia coli and Salmonella typhimurium, and experimentally validated some of the algorithm's suggested refinements. The adjusted E. coli model showed improved accuracy (∼80.0%) for predicting growth phenotypes for 50,557 cases (knockout mutants tested for growth in different environmental conditions). In addition to identifying needed model corrections, the algorithm was used to identify native E. coli genes that, if over-expressed, would allow E. coli to grow in new environments. We envision that this approach will enable the rapid development and assessment of genome-scale metabolic and regulatory network models for less characterized organisms, as such models can be constructed from genome annotations and cis-regulatory network predictions. Computational models of biological networks are useful for explaining experimental observations and predicting phenotypic behaviors. The construction of genome-scale metabolic and regulatory models is still a labor-intensive process, even with the availability of genome sequences and high-throughput datasets. Since our knowledge about biological systems is incomplete, these models are iteratively refined and validated as we discover new connections in biological networks, and eliminate inconsistencies between model predictions and experimental observations. To enable researchers to quickly determine what causes discrepancies between observed phenotypes and model predictions, we developed a new approach (GeneForce) that automatically corrects integrated metabolic and transcriptional regulatory network models. To illustrate the utility of the approach, we applied the developed method to well-curated models of E. coli metabolism and regulation. We found that the approach significantly improved the accuracy of phenotype predictions and suggested changes needed to the metabolic and/or regulatory models. We also used the approach to identify rescue non-growth phenotypes and to evaluate the conservation of transcriptional regulatory interactions between E. coli and S. typhimurium. The developed approach helps reconcile discrepancies between model predictions and experimental data by hypothesizing required network changes, and helps facilitate the development of new genome-scale models.