RFECS: A Random-Forest Based Algorithm for Enhancer Identification from Chromatin State

Open Access

14 March 2013

journal article
research article
Published by Public Library of Science (PLoS) in PLoS Computational Biology

Vol. 9 (3), e1002968
https://doi.org/10.1371/journal.pcbi.1002968

Abstract

Transcriptional enhancers play critical roles in regulation of gene expression, but their identification in the eukaryotic genome has been challenging. Recently, it was shown that enhancers in the mammalian genome are associated with characteristic histone modification patterns, which have been increasingly exploited for enhancer identification. However, only a limited number of cell types or chromatin marks have previously been investigated for this purpose, leaving the question unanswered whether there exists an optimal set of histone modifications for enhancer prediction in different cell types. Here, we address this issue by exploring genome-wide profiles of 24 histone modifications in two distinct human cell types, embryonic stem cells and lung fibroblasts. We developed a Random-Forest based algorithm, RFECS (Random Forest based Enhancer identification from Chromatin States) to integrate histone modification profiles for identification of enhancers, and used it to identify enhancers in a number of cell-types. We show that RFECS not only leads to more accurate and precise prediction of enhancers than previous methods, but also helps identify the most informative and robust set of three chromatin marks for enhancer prediction. Enhancers are regions in the genome that can activate the expression of a gene irrespective of their location with respect to the gene. Identifying these elements is critical in understanding regulatory differences between different cell-types. Since enhancers lack characteristic sequence features and can be far away from the gene they regulate, their identification is not trivial. Experimentally determining the genome-wide binding sites of transcriptional co-activator p300 is one way of finding enhancers but it can only identify a subset of enhancers. A few years ago, it was observed that the binding sites of p300 are marked by distinctive, post-translational histone modifications. Several groups have exploited this discovery to predict genome-wide enhancers based on their similarity to the histone modification profiles of p300 binding sites. We here report a novel algorithm for this purpose and show that it has much greater accuracy than existing methods. Another unique feature of our algorithm is the ability to automatically deduce the most informative set of histone modifications required for enhancer prediction. We expect that this method will become increasingly useful with the expanding number of known histone modifications and rapid accumulation of epigenomic datasets for various cell types and species.

This publication has 49 references indexed in Scilit:

Transcriptional Enhancers in Animal Development and Evolution
Current Biology, 2010
Discovery and characterization of chromatin states for systematic annotation of the human genome
Nature Biotechnology, 2010
Distinct Epigenomic Landscapes of Pluripotent and Lineage-Committed Human Cells
Cell Stem Cell, 2010
Widespread transcription at neuronal activity-regulated enhancers
Nature, 2010
Finding distal regulatory elements in the human genome
Current Opinion in Genetics & Development, 2009
Genome-wide Mapping of HATs and HDACs Reveals Distinct Functions in Active and Inactive Genes
Cell, 2009
High-Resolution Mapping and Characterization of Open Chromatin across the Genome
Cell, 2008
Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project
Nature, 2007
High-Resolution Profiling of Histone Methylations in the Human Genome
Cell, 2007
Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome
Nature Genetics, 2007

Cited by 215 articles