A deep auto-encoder model for gene expression prediction

Open Access

17 November 2017

journal article
conference paper
Published by Springer Science and Business Media LLC in BMC Genomics

Vol. 18 (S9), 39-49
https://doi.org/10.1186/s12864-017-4226-0

Abstract

Gene expression is a key intermediate level that genotypes lead to a particular trait. Gene expression is affected by various factors including genotypes of genetic variants. With an aim of delineating the genetic impact on gene expression, we build a deep auto-encoder model to assess how good genetic variants will contribute to gene expression changes. This new deep learning model is a regression-based predictive model based on the MultiLayer Perceptron and Stacked Denoising Auto-encoder (MLP-SAE). The model is trained using a stacked denoising auto-encoder for feature selection and a multilayer perceptron framework for backpropagation. We further improve the model by introducing dropout to prevent overfitting and improve performance. To demonstrate the usage of this model, we apply MLP-SAE to a real genomic datasets with genotypes and gene expression profiles measured in yeast. Our results show that the MLP-SAE model with dropout outperforms other models including Lasso, Random Forests and the MLP-SAE model without dropout. Using the MLP-SAE model with dropout, we show that gene expression quantifications predicted by the model solely based on genotypes, align well with true gene expression patterns. We provide a deep auto-encoder model for predicting gene expression from SNP genotypes. This study demonstrates that deep learning is appropriate for tackling another genomic problem, i.e., building predictive models to understand genotypes’ contribution to gene expression. With the emerging availability of richer genomic data, we anticipate that deep learning models play a bigger role in modeling and interpreting genomics.

This publication has 73 references indexed in Scilit:

DNdisorder: predicting protein disorder using boosting and deep networks
BMC Bioinformatics, 2013
A cross-platform analysis of 14,177 expression quantitative trait loci derived from lymphoblastoid cell lines
Genome Research, 2013
Variants in exons and in transcription factors affect gene expression in trans
Genome Biology, 2013
Leveraging input and output structures for joint mapping of epistatic and marginal eQTLs
Bioinformatics, 2012
Extensive genetic diversity and substructuring among zebrafish strains revealed through copy number variant analysis
Proceedings of the National Academy of Sciences of the United States of America, 2011
Relating CNVs to transcriptome data at fine resolution: Assessment of the effect of variant size, type, and overlap with functional regions
Genome Research, 2011
Transcriptome genetics using second generation sequencing in a Caucasian population
Nature, 2010
Understanding mechanisms underlying human gene expression variation with RNA sequencing
Nature, 2010
Genome 10K: A Proposal to Obtain Whole-Genome Sequence for 10 000 Vertebrate Species
Journal of Heredity, 2009
Relative Impact of Nucleotide and Copy Number Variation on Gene Expression Phenotypes
Science, 2007

Cited by 75 articles