GRAM: A GeneRAlized Model to predict the molecular effect of a non-coding variant in a cell-type specific manner
Open Access
- 30 August 2019
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Genetics
- Vol. 15 (8), e1007860
- https://doi.org/10.1371/journal.pgen.1007860
Abstract
There has been much effort to prioritize genomic variants with respect to their impact on “function”. However, function is often not precisely defined: sometimes it is the disease association of a variant; on other occasions, it reflects a molecular effect on transcription or epigenetics. Here, we coupled multiple genomic predictors to build GRAM, a GeneRAlized Model, to predict a well-defined experimental target: the expression-modulating effect of a non-coding variant on its associated gene, in a transferable, cell-specific manner. Firstly, we performed feature engineering: using LASSO, a regularized linear model, we found transcription factor (TF) binding most predictive, especially for TFs that are hubs in the regulatory network; in contrast, evolutionary conservation, a popular feature in many other variant-impact predictors, has almost no contribution. Moreover, TF binding inferred from in vitro SELEX is as effective as that from in vivo ChIP-Seq. Second, we implemented GRAM integrating only SELEX features and expression profiles; thus, the program combines a universal regulatory score with an easily obtainable modifier reflecting the particular cell type. We benchmarked GRAM on large-scale MPRA datasets, achieving AUROC scores of 0.72 in GM12878 and 0.66 in a multi-cell line dataset. We then evaluated the performance of GRAM on targeted regions using luciferase assays in the MCF7 and K562 cell lines. We noted that changing the insertion position of the construct relative to the reporter gene gave very different results, highlighting the importance of carefully defining the exact prediction target of the model. Finally, we illustrated the utility of GRAM in fine-mapping causal variants and developed a practical software pipeline to carry this out. In particular, we demonstrated in specific examples how the pipeline could pinpoint variants that directly modulate gene expression within a larger linkage-disequilibrium block associated with a phenotype of interest (e.g., for an eQTL). With advances in sequencing technologies, a deluge of genomic data is available; however, only a fraction of non-coding genomic variants are functionally relevant. Sifting through this data to prioritize genomic variants with respect to function is an important but challenging task. In this study, we built GRAM, a GeneRAlized Model, to predict the expression-modulating effects of non-coding variants in a cell-specific manner. GRAM combines a universal regulatory score defined by transcription factor binding with an easily obtainable modifier defined by transcription factor binding and expression to reflect the particular cell type. We evaluated this framework on multiple cell lines with high performance and showed that it could be applied to any cell line or sample with gene expression data. We also integrated GRAM into a practical software pipeline to fine-map causal variants that directly modulate gene expression among a larger linkage-disequilibrium block associated with a phenotype of interest. GRAM complements other general variant effect prediction methods–which often combine disparate features–by helping to precisely define the subset of prioritized variants that directly alters gene expression.This publication has 57 references indexed in Scilit:
- Systematic dissection of regulatory motifs in 2000 predicted human enhancers using a massively parallel reporter assayGenome Research, 2013
- An integrated map of genetic variation from 1,092 human genomesNature, 2012
- Engineered Luciferase Reporter from a Deep Sea Shrimp Utilizing a Novel Imidazopyrazinone SubstrateACS Chemical Biology, 2012
- Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assayNature Biotechnology, 2012
- Massively parallel functional dissection of mammalian enhancers in vivoNature Biotechnology, 2012
- Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificitiesGenome Research, 2010
- Personal genome sequencing: current approaches and challengesJournal of Bone and Joint Surgery, 2010
- Design and analysis of ChIP-seq experiments for DNA-binding proteinsNature Biotechnology, 2008
- Revealing the architecture of gene regulation: the promise of eQTL studiesTrends in Genetics, 2008
- Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomesGenome Research, 2005