Learning a hierarchical representation of the yeast transcriptomic machinery using an autoencoder model

Open Access

11 January 2016

journal article
conference paper
Published by Springer Science and Business Media LLC in BMC Bioinformatics

Vol. 17 (S1), 97-107
https://doi.org/10.1186/s12859-015-0852-1

Abstract

A living cell has a complex, hierarchically organized signaling system that encodes and assimilates diverse environmental and intracellular signals, and it further transmits signals that control cellular responses, including a tightly controlled transcriptional program. An important and yet challenging task in systems biology is to reconstruct cellular signaling system in a data-driven manner. In this study, we investigate the utility of deep hierarchical neural networks in learning and representing the hierarchical organization of yeast transcriptomic machinery. We have designed a sparse autoencoder model consisting of a layer of observed variables and four layers of hidden variables. We applied the model to over a thousand of yeast microarrays to learn the encoding system of yeast transcriptomic machinery. After model selection, we evaluated whether the trained models captured biologically sensible information. We show that the latent variables in the first hidden layer correctly captured the signals of yeast transcription factors (TFs), obtaining a close to one-to-one mapping between latent variables and TFs. We further show that genes regulated by latent variables at higher hidden layers are often involved in a common biological process, and the hierarchical relationships between latent variables conform to existing knowledge. Finally, we show that information captured by the latent variables provide more abstract and concise representations of each microarray, enabling the identification of better separated clusters in comparison to gene-based representation. Contemporary deep hierarchical latent variable models, such as the autoencoder, can be used to partially recover the organization of transcriptomic machinery.

Keywords

This publication has 25 references indexed in Scilit:

Merged consensus clustering to assess and improve class discovery with microarray data
BMC Bioinformatics, 2010
Integrating Proteomic, Transcriptional, and Interactome Data Reveals Hidden Components of Signaling and Regulatory Networks
Science Signaling, 2009
Bridging high-throughput genetic and transcriptional data reveals cellular responses to alpha-synuclein toxicity
Nature Genetics, 2009
Nonnegative Matrix Factorization: An Analytical and Interpretive Tool in Computational Biology
PLoS Computational Biology, 2008
Reducing the Dimensionality of Data with Neural Networks
Science, 2006
A Fast Learning Algorithm for Deep Belief Nets
Neural Computation, 2006
Transcriptional regulatory code of a eukaryotic genome
Nature, 2004
Regulation of Longevity and Stress Resistance by Sch9 in Yeast
Science, 2001
The minimum description length principle in coding and modeling
IEEE Transactions on Information Theory, 1998
Functional analysis of a growth factor-responsive transcription factor complex
Cell, 1993

Cited by 71 articles