StanDep: Capturing transcriptomic variability improves context-specific metabolic models

Abstract
Diverse algorithms can integrate transcriptomics with genome-scale metabolic models (GEMs) to build context-specific metabolic models. These algorithms require identification of a list of high confidence (core) reactions from transcriptomics, but parameters related to identification of core reactions, such as thresholding of expression profiles, can significantly change model content. Importantly, current thresholding approaches are burdened with setting singular arbitrary thresholds for all genes; thus, resulting in removal of enzymes needed in small amounts and even many housekeeping genes. Here, we describe StanDep, a novel heuristic method for using transcriptomics to identify core reactions prior to building context-specific metabolic models. StanDep clusters gene expression data based on their expression pattern across different contexts and determines thresholds for each cluster using data-dependent statistics, specifically standard deviation and mean. To demonstrate the use of StanDep, we built hundreds of models for the NCI-60 cancer cell lines. These models successfully increased the inclusion of housekeeping reactions, which are often lost in models built using standard thresholding approaches. Further, StanDep also provided a transcriptomic explanation for inclusion of lowly expressed reactions that were otherwise only supported by model extraction methods. Our study also provides novel insights into how cells may deal with context-specific and ubiquitous functions. StanDep, as a MATLAB toolbox, is available at https://github.com/LewisLabUCSD/StanDep Author summary Integration of transcriptomics data with genome-scale metabolic models is appealing but challenging due to the number of parametric decisions required to be made to by the user. This is further exacerbated by models failing to capture functionalities which are important for cellular maintenance. In this study, we propose a thresholding method for functionally qualifying a metabolic reaction to be active. We used our method to extract models of NCI-60 cancer cell lines, human tissues, and C. elegans cell types. We show that our thresholding method improves the coverage of functions required for cellular maintenance. We also validated and compared models built with our approach against those with existing approaches using CRISPR-Cas9 essentiality screens. Overall, our study provides novel insights into how cells may deal with context-specific and ubiquitous functions.