Epigenetic priors for identifying active transcription factor binding sites
Open Access
- 8 November 2011
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 28 (1), 56-62
- https://doi.org/10.1093/bioinformatics/btr614
Abstract
Motivation Accurate knowledge of the genome-wide binding of transcription factors in a particular cell type or under a particular condition is necessary for understanding transcriptional regulation. Using epigenetic data such as histone modification and DNase I, accessibility data has been shown to improve motif-based in silico methods for predicting such binding, but this approach has not yet been fully explored. Results We describe a probabilistic method for combining one or more tracks of epigenetic data with a standard DNA sequence motif model to improve our ability to identify active transcription factor binding sites (TFBSs). We convert each data type into a position-specific probabilistic prior and combine these priors with a traditional probabilistic motif model to compute a log-posterior odds score. Our experiments, using histone modifications H3K4me1, H3K4me3, H3K9ac and H3K27ac, as well as DNase I sensitivity, show conclusively that the log-posterior odds score consistently outperforms a simple binary filter based on the same data. We also show that our approach performs competitively with a more complex method, CENTIPEDE, and suggest that the relative simplicity of the log-posterior odds scoring method makes it an appealing and very general method for identifying functional TFBSs on the basis of DNA and epigenetic evidence. Availability and implementation: FIMO, part of the MEME Suite software toolkit, now supports log-posterior odds scoring using position-specific priors for motif search. A web server and source code are available at http://meme.nbcr.net. Utilities for creating priors are at http://research.imb.uq.edu.au/t.bailey/SD/Cuellar2011. Contact:t.bailey@uq.edu.au Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 28 references indexed in Scilit:
- Genome-wide maps of chromatin state in pluripotent and lineage-committed cellsNature, 2007
- Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencingNature Methods, 2007
- High-Resolution Profiling of Histone Methylations in the Human GenomeCell, 2007
- Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genomeNature Genetics, 2007
- Informative priors based on transcription factor structural class improve de novo motif discoveryBioinformatics, 2006
- DNase-chip: a high-resolution method to identify DNase I hypersensitive sites using tiled microarraysNature Methods, 2006
- Distant conserved sequences flanking endothelial-specific promoters contain tissue-specific DNase-hypersensitive sites and over-represented motifsHuman Molecular Genetics, 2006
- CisModule: De novo discovery of cis-regulatory modules by hierarchical mixture modelingProceedings of the National Academy of Sciences of the United States of America, 2004
- Measuring the Accuracy of Diagnostic SystemsScience, 1988
- The 5′ ends of Drosophila heat shock genes in chromatin are hypersensitive to DNase INature, 1980