PLOS Computational Biology
ISSN / EISSN : 1553734X / 15537358
Current Publisher: Public Library of Science (PLoS) (10.1371)
Total articles ≅ 7,005
Google Scholar h5-index: 79
Latest articles in this journal
PLOS Computational Biology, Volume 15; doi:10.1371/journal.pcbi.1007338
Abstract:T cells use their T-cell receptors (TCRs) to scan other cells for antigenic peptides presented by MHC molecules (pMHC). If a TCR encounters a pMHC, it can trigger a signalling pathway that could lead to the activation of the T cell and the initiation of an immune response. It is currently not clear how the binding of pMHC to the TCR initiates signalling within the T cell. One hypothesis is that conformational changes in the TCR lead to further downstream signalling. Here we investigate four different TCRs in their free state as well as in their pMHC bound state using large scale molecular simulations totalling 26 000 ns. We find that the dynamical features within TCRs differ significantly between unbound TCR and TCR/pMHC simulations. However, apart from expected results such as reduced solvent accessibility and flexibility of the interface residues, these features are not conserved among different TCR types. The presence of a pMHC alone is not sufficient to cause cross-TCR-conserved dynamical features within a TCR. Our results argue against models of TCR triggering involving conserved allosteric conformational changes. The interaction between T-cells and other cells is one of the most important interactions in the human immune system. If T-cells are not triggered major parts of the immune system cannot be activated or are not working effectively. Despite many years of research the exact mechanism of how a T-cell is initially triggered is not clear. One hypothesis is that conformational changes within the T-cell receptor (TCR) can cause further downstream signalling within the T-cell. In this study we computationally investigate the dynamics of four different TCRs in their free and bound configuration. Our large scale simulations show that all four TCRs react to binding in different ways. In some TCRs mainly the areas close to the binding region are affected while in other TCRs areas further apart from the binding region are also affected. Our results argue against a conserved structural activation mechanism across different types of TCRs.
PLOS Computational Biology, Volume 15; doi:10.1371/journal.pcbi.1007158
Abstract:Chemotherapy resistance is a major challenge to the effective treatment of cancer. Thus, a systematic pipeline for the efficient identification of effective combination treatments could bring huge biomedical benefit. In order to facilitate rational design of combination therapies, we developed a comprehensive computational model that incorporates the available biological knowledge and relevant experimental data on the life-and-death response of individual cancer cells to cisplatin or cisplatin combined with the TNF-related apoptosis-inducing ligand (TRAIL). The model’s predictions, that a combination treatment of cisplatin and TRAIL would enhance cancer cell death and exhibit a “two-wave killing” temporal pattern, was validated by measuring the dynamics of p53 accumulation, cell fate, and cell death in single cells. The validated model was then subjected to a systematic analysis with an ensemble of diverse machine learning methods. Though each method is characterized by a different algorithm, they collectively identified several molecular players that can sensitize tumor cells to cisplatin-induced apoptosis (sensitizers). The identified sensitizers are consistent with previous experimental observations. Overall, we have illustrated that machine learning analysis of an experimentally validated mechanistic model can convert our available knowledge into the identity of biologically meaningful sensitizers. This knowledge can then be leveraged to design treatment strategies that could improve the efficacy of chemotherapy. Combination chemotherapy is frequently used in the fight against cancer as treatment with multiple chemotherapy drugs of different molecular mechanisms reduces the chance of resistance. The complex mechanisms involved makes it essential to develop a comprehensive computational model that comprehends experimental data and biological knowledge to facilitate design of combination therapies. As computational models grow and capture more and more molecular events governing the chemotherapy response, it becomes harder to explore the treatment space efficiently and systematically. To facilitate the extraction of unbiased solutions from complicated models, we have conducted systematic analysis using a series of machine learning methods including Partial Least Squares regression (PLS), Random forest (RF), Logistic Regression (LR) and Support Vector Machine (SVM). The results of these different methods were cross-validated to reduce the chance of overfitting or bias by any single method. Overall, we propose a novel computational pipeline, where machine learning analysis of experimentally validated models is used to generate unbiased predictions of novel chemotherapy targets.
PLOS Computational Biology, Volume 15; doi:10.1371/journal.pcbi.1007326
Abstract:Value-based decision-making involves trading off the cost associated with an action against its expected reward. Research has shown that both physical and mental effort constitute such subjective costs, biasing choices away from effortful actions, and discounting the value of obtained rewards. Facing conflicts between competing action alternatives is considered aversive, as recruiting cognitive control to overcome conflict is effortful. Moreover, engaging control to proactively suppress irrelevant information that could conflict with task-relevant information would presumably also be cognitively costly. Yet, it remains unclear whether the cognitive control demands involved in preventing and resolving conflict also constitute costs in value-based decisions. The present study investigated this question by embedding irrelevant distractors (flanker arrows) within a reversal-learning task, with intermixed free and instructed trials. Results showed that participants learned to adapt their free choices to maximize rewards, but were nevertheless biased to follow the suggestions of irrelevant distractors. Thus, the perceived cost of investing cognitive control to suppress an external suggestion could sometimes trump internal value representations. By adapting computational models of reinforcement learning, we assessed the influence of conflict at both the decision and learning stages. Modelling the decision showed that free choices were more biased when participants were less sure about which action was more rewarding. This supports the hypothesis that the costs linked to conflict management were traded off against expected rewards. During the learning phase, we found that learning rates were reduced in instructed, relative to free, choices. Learning rates were further reduced by conflict between an instruction and subjective action values, whereas learning was not robustly influenced by conflict between one’s actions and external distractors. Our results show that the subjective cognitive control costs linked to conflict factor into value-based decision-making, and highlight that different types of conflict may have different effects on learning about action outcomes. Value-based decision-making involves trading off the cost associated with an action–such as physical or mental effort–against its expected reward. Although facing conflicts between competing action alternatives is considered aversive and effortful, it remains unclear whether conflict also constitutes a cost in value-based decisions. We tested this hypothesis by combining a classic conflict (flanker) task with a reinforcement-learning task. Results showed that participants learned to maximise their earnings, but were nevertheless biased to follow irrelevant suggestions. Computational model-based analyses showed a greater choice bias with more uncertainty about the best action to make, supporting the hypothesis that the costs linked to conflict management were traded off against...
PLOS Computational Biology, Volume 15; doi:10.1371/journal.pcbi.1007310
Abstract:Deciphering the mechanisms of regulation of metabolic networks subjected to perturbations, including disease states and drug-induced stress, relies on tracing metabolic fluxes. One of the most informative data to predict metabolic fluxes are 13C based metabolomics, which provide information about how carbons are redistributed along central carbon metabolism. Such data can be integrated using 13C Metabolic Flux Analysis (13C MFA) to provide quantitative metabolic maps of flux distributions. However, 13C MFA might be unable to reduce the solution space towards a unique solution either in large metabolic networks or when small sets of measurements are integrated. Here we present parsimonious 13C MFA (p13CMFA), an approach that runs a secondary optimization in the 13C MFA solution space to identify the solution that minimizes the total reaction flux. Furthermore, flux minimization can be weighted by gene expression measurements allowing seamless integration of gene expression data with 13C data. As proof of concept, we demonstrate how p13CMFA can be used to estimate intracellular flux distributions from 13C measurements and transcriptomics data. We have implemented p13CMFA in Iso2Flux, our in-house developed isotopic steady-state 13C MFA software. The source code is freely available on GitHub (https://github.com/cfoguet/iso2flux/releases/tag/0.7.2). 13C Metabolic Flux Analysis (13C MFA) is a well-established technique that has proven to be a valuable tool in quantifying the metabolic flux profile of central carbon metabolism. When a biological system is incubated with a 13C-labeled substrate, 13C propagates to metabolites throughout the metabolic network in a flux and pathway-dependent manner. 13C MFA integrates measurements of 13C enrichment in metabolites to identify the flux distributions consistent with the measured 13C propagation. However, there is often a range of flux values that can lead to the observed 13C distribution. Indeed, either when the metabolic network is large or a small set of measurements are integrated, the range of valid solutions can be too wide to accurately estimate part of the underlying flux distribution. Here we propose to use flux minimization to select the best flux solution in the13C MFA solution space. Furthermore, this approach can integrate gene expression data to give greater weight to the minimization of fluxes through enzymes with low gene expression evidence in order to ensure that the selected solution is biologically relevant. The concept of using flux minimization to select the best solution is widely used in flux balance analysis, but it had never been applied in the framework of 13C MFA. We have termed this new approach parsimonious 13C MFA (p13CMFA).
PLOS Computational Biology, Volume 15; doi:10.1371/journal.pcbi.1006883
Abstract:How muscles are used is a key to understanding the internal driving of fish swimming. However, the underlying mechanisms of some features of the muscle activation patterns and their differential appearance in different species are still obscure. In this study, we explain the muscle activation patterns by using 3D computational fluid dynamics models coupled to the motion of fish with prescribed deformation and examining the torque and power required along the fish body with two primary swimming modes. We find that the torque required by the hydrodynamic forces and body inertia exhibits a wave pattern that travels faster than the curvature wave in both anguilliform and carangiform swimmers, which can explain the traveling wave speeds of the muscle activations. Notably, intermittent negative power (i.e., power delivered by the fluid to the body) on the posterior part, along with a timely transfer of torque and energy by tendons, explains the decrease in the duration of muscle activation towards the tail. The torque contribution from the body elasticity further clarifies the wave speed increase or the reverse of the wave direction of the muscle activation on the posterior part of a carangiform swimmer. For anguilliform swimmers, the absence of the aforementioned changes in the muscle activation on the posterior part is consistent with our torque prediction and the absence of long tendons from experimental observations. These results provide novel insights into the functions of muscles and tendons as an integral part of the internal driving system, especially from an energy perspective, and they highlight the differences in the internal driving systems between the two primary swimming modes. For undulatory swimming, fish form posteriorly traveling waves of body bending by activating their muscles sequentially along the body. However, experimental observations have shown that the muscle activation wave does not simply match the bending wave. Researchers have previously computed the torque required for muscles along the body based on classic hydrodynamic theories and explained the higher wave speed of the muscle activation compared to the curvature wave. However, the origins of other features of the muscle activation pattern and their variation among different species are still obscure after decades of research. In this study, we use 3D computational fluid dynamics models to compute the spatiotemporal distributions of both the torque and power required for eel-like and mackerel-like swimming. By examining both the torque and power patterns and considering the energy transfer, storage, and release by tendons and body viscoelasticity, we can explain not only the features and variations in the muscle activation patterns as observed from fish experiments but also how tendons and body elasticity save energy. We provide a mechanical picture in which the body shape, body movement, muscles, tendons, and body elasticity of a mackerel (or similar)...
PLOS Computational Biology, Volume 15; doi:10.1371/journal.pcbi.1007290
Abstract:Across diverse biological systems—ranging from neural networks to intracellular signaling and genetic regulatory networks—the information about changes in the environment is frequently encoded in the full temporal dynamics of the network nodes. A pressing data-analysis challenge has thus been to efficiently estimate the amount of information that these dynamics convey from experimental data. Here we develop and evaluate decoding-based estimation methods to lower bound the mutual information about a finite set of inputs, encoded in single-cell high-dimensional time series data. For biological reaction networks governed by the chemical Master equation, we derive model-based information approximations and analytical upper bounds, against which we benchmark our proposed model-free decoding estimators. In contrast to the frequently-used k-nearest-neighbor estimator, decoding-based estimators robustly extract a large fraction of the available information from high-dimensional trajectories with a realistic number of data samples. We apply these estimators to previously published data on Erk and Ca2+ signaling in mammalian cells and to yeast stress-response, and find that substantial amount of information about environmental state can be encoded by non-trivial response statistics even in stationary signals. We argue that these single-cell, decoding-based information estimates, rather than the commonly-used tests for significant differences between selected population response statistics, provide a proper and unbiased measure for the performance of biological signaling networks. Cells represent changes in their own state or in the state of their environment by temporally varying the concentrations of intracellular signaling molecules, mimicking in a simple chemical context the way we humans represent our thoughts and observations through temporally varying patterns of sounds that constitute speech. These time-varying concentrations are used as signals to regulate downstream molecular processes, to mount appropriate cellular responses for the environmental challenges, or to communicate with nearby cells. But how precise and unambiguous is such chemical communication, in theory and in data? On the one hand, intuition tells us that many possible environmental changes could be represented by variation in concentration patterns of multiple signaling chemicals; on the other, we know that chemical signals are inherently noisy at the molecular scale. Here we develop data analysis methodology that allows us to pose and answer these questions rigorously. Our decoding-based information estimators, which we test on simulated and real data from yeast and mammalian cells, measure how precisely individual cells can detect and report environmental changes, without making assumptions about the structure of the chemical communication and using only the amounts of data that is typically available in today’s experiments.
PLOS Computational Biology, Volume 15; doi:10.1371/journal.pcbi.1007321
Abstract:We present a new computational model of speech motor control: the Feedback-Aware Control of Tasks in Speech or FACTS model. FACTS employs a hierarchical state feedback control architecture to control simulated vocal tract and produce intelligible speech. The model includes higher-level control of speech tasks and lower-level control of speech articulators. The task controller is modeled as a dynamical system governing the creation of desired constrictions in the vocal tract, after Task Dynamics. Both the task and articulatory controllers rely on an internal estimate of the current state of the vocal tract to generate motor commands. This estimate is derived, based on efference copy of applied controls, from a forward model that predicts both the next vocal tract state as well as expected auditory and somatosensory feedback. A comparison between predicted feedback and actual feedback is then used to update the internal state prediction. FACTS is able to qualitatively replicate many characteristics of the human speech system: the model is robust to noise in both the sensory and motor pathways, is relatively unaffected by a loss of auditory feedback but is more significantly impacted by the loss of somatosensory feedback, and responds appropriately to externally-imposed alterations of auditory and somatosensory feedback. The model also replicates previously hypothesized trade-offs between reliance on auditory and somatosensory feedback and shows for the first time how this relationship may be mediated by acuity in each sensory domain. These results have important implications for our understanding of the speech motor control system in humans. Speaking is one of the most complex motor tasks humans perform, but it’s neural and computational bases are not well understood. We present a new computational model that generates speech movements by comparing high-level language production goals with an internal estimate of the current state of the vocal tract. This model reproduces many key human behaviors, including making appropriate responses to multiple types of external perturbations to sensory feedback, and makes a number of novel predictions about the speech motor system. These results have implications for our understanding of healthy speech as well as speech impairments caused by neurological disorders. They also suggest that the mechanisms of control are shared between speech and other motor domains.
PLOS Computational Biology, Volume 15; doi:10.1371/journal.pcbi.1007348
Abstract:Cellular microscopy images contain rich insights about biology. To extract this information, researchers use features, or measurements of the patterns of interest in the images. Here, we introduce a convolutional neural network (CNN) to automatically design features for fluorescence microscopy. We use a self-supervised method to learn feature representations of single cells in microscopy images without labelled training data. We train CNNs on a simple task that leverages the inherent structure of microscopy images and controls for variation in cell morphology and imaging: given one cell from an image, the CNN is asked to predict the fluorescence pattern in a second different cell from the same image. We show that our method learns high-quality features that describe protein expression patterns in single cells both yeast and human microscopy datasets. Moreover, we demonstrate that our features are useful for exploratory biological analysis, by capturing high-resolution cellular components in a proteome-wide cluster analysis of human proteins, and by quantifying multi-localized proteins and single-cell variability. We believe paired cell inpainting is a generalizable method to obtain feature representations of single cells in multichannel microscopy images. To understand the cell biology captured by microscopy images, researchers use features, or measurements of relevant properties of cells, such as the shape or size of cells, or the intensity of fluorescent markers. Features are the starting point of most image analysis pipelines, so their quality in representing cells is fundamental to the success of an analysis. Classically, researchers have relied on features manually defined by imaging experts. In contrast, deep learning techniques based on convolutional neural networks (CNNs) automatically learn features, which can outperform manually-defined features at image analysis tasks. However, most CNN methods require large manually-annotated training datasets to learn useful features, limiting their practical application. Here, we developed a new CNN method that learns high-quality features for single cells in microscopy images, without the need for any labeled training data. We show that our features surpass other comparable features in identifying protein localization from images, and that our method can generalize to diverse datasets. By exploiting our method, researchers will be able to automatically obtain high-quality features customized to their own image datasets, facilitating many downstream analyses, as we highlight by demonstrating many possible use cases of our features in this study.
PLOS Computational Biology, Volume 15; doi:10.1371/journal.pcbi.1006909
Abstract:Proteases are multifunctional, promiscuous enzymes that degrade proteins as well as peptides and drive important processes in health and disease. Current technology has enabled the construction of libraries of peptide substrates that detect protease activity, which provides valuable biological information. An ideal library would be orthogonal, such that each protease only hydrolyzes one unique substrate, however this is impractical due to off-target promiscuity (i.e., one protease targets multiple different substrates). Therefore, when a library of probes is exposed to a cocktail of proteases, each protease activates multiple probes, producing a convoluted signature. Computational methods for parsing these signatures to estimate individual protease activities primarily use an extensive collection of all possible protease-substrate combinations, which require impractical amounts of training data when expanding to search for more candidate substrates. Here we provide a computational method for estimating protease activities efficiently by reducing the number of substrates and clustering proteases with similar cleavage activities into families. We envision that this method will be used to extract meaningful diagnostic information from biological samples. The activity of enzymatic proteins, which are called proteases, drives numerous important processes in health and disease: including cancer, immunity, and infectious disease. Many labs have developed useful diagnostics by designing sensors that measure the activity of these proteases. However, if we want to detect multiple proteases at the same time, it becomes impractical to design sensors that only detect one protease. This is due to a phenomenon called protease promiscuity, which means that proteases will activate multiple different sensors. Computational methods have been created to solve this problem, but the challenge is that these often require large amounts of training data. Further, completely different proteases may be detected by the same subset of sensors. In this work, we design a computational method to overcome this problem by clustering similar proteases into "subfamilies", which increases estimation accuracy. Further, our method tests multiple combinations of sensors to maintain accuracy while minimizing the number of sensors used. Together, we envision that this work will increase the amount of useful information we can extract from biological samples, which may lead to better clinical diagnostics.
PLOS Computational Biology, Volume 15; doi:10.1371/journal.pcbi.1007276
Abstract:In-silico identification of potential target genes for disease is an essential aspect of drug target discovery. Recent studies suggest that successful targets can be found through by leveraging genetic, genomic and protein interaction information. Here, we systematically tested the ability of 12 varied algorithms, based on network propagation, to identify genes that have been targeted by any drug, on gene-disease data from 22 common non-cancerous diseases in OpenTargets. We considered two biological networks, six performance metrics and compared two types of input gene-disease association scores. The impact of the design factors in performance was quantified through additive explanatory models. Standard cross-validation led to over-optimistic performance estimates due to the presence of protein complexes. In order to obtain realistic estimates, we introduced two novel protein complex-aware cross-validation schemes. When seeding biological networks with known drug targets, machine learning and diffusion-based methods found around 2-4 true targets within the top 20 suggestions. Seeding the networks with genes associated to disease by genetics decreased performance below 1 true hit on average. The use of a larger network, although noisier, improved overall performance. We conclude that diffusion-based prioritisers and machine learning applied to diffusion-based features are suited for drug discovery in practice and improve over simpler neighbour-voting methods. We also demonstrate the large impact of choosing an adequate validation strategy and the definition of seed disease genes. The use of biological network data has proven its effectiveness in many areas from computational biology. Networks consist of nodes, usually genes or proteins, and edges that connect pairs of nodes, representing information such as physical interactions, regulatory roles or co-occurrence. In order to find new candidate nodes for a given biological property, the so-called network propagation algorithms start from the set of known nodes with that property and leverage the connections from the biological network to make predictions. Here, we assess the performance of several network propagation algorithms to find sensible gene targets for 22 common non-cancerous diseases, i.e. those that have been found promising enough to start the clinical trials with any compound. We focus on obtaining performance metrics that reflect a practical scenario in drug development where only a small set of genes can be essayed. We found that the presence of protein complexes biased the performance estimates, leading to over-optimistic conclusions, and introduced two novel strategies to address it. Our results support that network propagation is still a viable approach to find drug targets, but that special care needs to be put on the validation strategy. Algorithms benefitted from the use of a larger -although noisier- network and of direct evidence data, rather than indirect genetic...