High Resolution Genome Wide Binding Event Finding and Motif Discovery Reveals Transcription Factor Spatial Binding Constraints
Open Access
- 9 August 2012
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Computational Biology
- Vol. 8 (8), e1002638
- https://doi.org/10.1371/journal.pcbi.1002638
Abstract
An essential component of genome function is the syntax of genomic regulatory elements that determine how diverse transcription factors interact to orchestrate a program of regulatory control. A precise characterization of in vivo spacing constraints between key transcription factors would reveal key aspects of this genomic regulatory language. To discover novel transcription factor spatial binding constraints in vivo, we developed a new integrative computational method, genome wide event finding and motif discovery (GEM). GEM resolves ChIP data into explanatory motifs and binding events at high spatial resolution by linking binding event discovery and motif discovery with positional priors in the context of a generative probabilistic model of ChIP data and genome sequence. GEM analysis of 63 transcription factors in 214 ENCODE human ChIP-Seq experiments recovers more known factor motifs than other contemporary methods, and discovers six new motifs for factors with unknown binding specificity. GEM's adaptive learning of binding-event read distributions allows it to further improve upon previous methods for processing ChIP-Seq and ChIP-exo data to yield unsurpassed spatial resolution and discovery of closely spaced binding events of the same factor. In a systematic analysis of in vivo sequence-specific transcription factor binding using GEM, we have found hundreds of spatial binding constraints between factors. GEM found 37 examples of factor binding constraints in mouse ES cells, including strong distance-specific constraints between Klf4 and other key regulatory factors. In human ENCODE data, GEM found 390 examples of spatially constrained pair-wise binding, including such novel pairs as c-Fos:c-Jun/USF1, CTCF/Egr1, and HNF4A/FOXA1. The discovery of new factor-factor spatial constraints in ChIP data is significant because it proposes testable models for regulatory factor interactions that will help elucidate genome function and the implementation of combinatorial control. The letters in our genome spell words and phrases that control when each gene is activated. To understand how these words and phrases function in health and disease, we have developed a new computational method to determine what word positions in our genomic text are used by each genome regulatory protein, and how these active words are spaced relative to one another. Our method achieves exceptional spatial accuracy by integrating experimental data with the text of our genome to find the precise words that are regulated by each protein factor. Using this analysis we have discovered novel word spacings in the experimental data that suggest novel genome grammatical control constructs.Keywords
This publication has 49 references indexed in Scilit:
- Comprehensive Genome-wide Protein-DNA Interactions Detected at Single-Nucleotide ResolutionCell, 2011
- The transcription factor BATF controls the global regulators of class-switch recombination in both B cells and T cellsNature Immunology, 2011
- Insights from genomic profiling of transcription factorsNature Reviews Genetics, 2009
- ChIP-seq accurately predicts tissue-specific activity of enhancersNature, 2009
- An integrated software system for analyzing ChIP-chip and ChIP-seq dataNature Biotechnology, 2008
- Genome-wide analysis of transcription factor binding sites based on ChIP-Seq dataNature Methods, 2008
- Integration of External Signaling Pathways with the Core Transcriptional Network in Embryonic Stem CellsCell, 2008
- Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot projectNature, 2007
- Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificitiesNature Biotechnology, 2006
- An algorithm for finding protein–DNA binding sites with applications to chromatin- immunoprecipitation microarray experimentsNature Biotechnology, 2002