Epigenetic priors for identifying active transcription factor binding sites

Open Access

8 November 2011

journal article
research article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 28 (1), 56-62
https://doi.org/10.1093/bioinformatics/btr614

Abstract

Motivation Accurate knowledge of the genome-wide binding of transcription factors in a particular cell type or under a particular condition is necessary for understanding transcriptional regulation. Using epigenetic data such as histone modification and DNase I, accessibility data has been shown to improve motif-based in silico methods for predicting such binding, but this approach has not yet been fully explored. Results We describe a probabilistic method for combining one or more tracks of epigenetic data with a standard DNA sequence motif model to improve our ability to identify active transcription factor binding sites (TFBSs). We convert each data type into a position-specific probabilistic prior and combine these priors with a traditional probabilistic motif model to compute a log-posterior odds score. Our experiments, using histone modifications H3K4me1, H3K4me3, H3K9ac and H3K27ac, as well as DNase I sensitivity, show conclusively that the log-posterior odds score consistently outperforms a simple binary filter based on the same data. We also show that our approach performs competitively with a more complex method, CENTIPEDE, and suggest that the relative simplicity of the log-posterior odds scoring method makes it an appealing and very general method for identifying functional TFBSs on the basis of DNA and epigenetic evidence. Availability and implementation: FIMO, part of the MEME Suite software toolkit, now supports log-posterior odds scoring using position-specific priors for motif search. A web server and source code are available at http://meme.nbcr.net. Utilities for creating priors are at http://research.imb.uq.edu.au/t.bailey/SD/Cuellar2011. Contact:t.bailey@uq.edu.au Supplementary information: Supplementary data are available at Bioinformatics online.

Keywords

This publication has 28 references indexed in Scilit:

Genome-wide maps of chromatin state in pluripotent and lineage-committed cells
Nature, 2007
Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing
Nature Methods, 2007
High-Resolution Profiling of Histone Methylations in the Human Genome
Cell, 2007
Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome
Nature Genetics, 2007
Informative priors based on transcription factor structural class improve de novo motif discovery
Bioinformatics, 2006
DNase-chip: a high-resolution method to identify DNase I hypersensitive sites using tiled microarrays
Nature Methods, 2006
Distant conserved sequences flanking endothelial-specific promoters contain tissue-specific DNase-hypersensitive sites and over-represented motifs
Human Molecular Genetics, 2006
CisModule: De novo discovery of cis-regulatory modules by hierarchical mixture modeling
Proceedings of the National Academy of Sciences of the United States of America, 2004
Measuring the Accuracy of Diagnostic Systems
Science, 1988
The 5′ ends of Drosophila heat shock genes in chromatin are hypersensitive to DNase I
Nature, 1980

Cited by 102 articles