Genome-wide discovery of active regulatory elements and transcription factor footprints in Caenorhabditis elegans using DNase-seq

Abstract
Deep sequencing of size-selected DNase I–treated chromatin (DNase-seq) allows high-resolution measurement of chromatin accessibility to DNase I cleavage, permitting identification of de novo active cis-regulatory modules (CRMs) and individual transcription factor (TF) binding sites. We adapted DNase-seq to nuclei isolated from C. elegans embryos and L1 arrest larvae to generate high-resolution maps of TF binding. Over half of embryonic DNase I hypersensitive sites (DHSs) were annotated as noncoding, with 24% in intergenic, 12% in promoters, and 28% in introns, with similar statistics observed in L1 arrest larvae. Noncoding DHSs are highly conserved and enriched in marks of enhancer activity and transcription. We validated noncoding DHSs against known enhancers from myo-2, myo-3, hlh-1, elt-2, and lin-26/lir-1 and recapitulated 15 of 17 known enhancers. We then mined DNase-seq data to identify putative active CRMs and TF footprints. Using DNase-seq data improved predictions of tissue-specific expression compared with motifs alone. In a pilot functional test, 10 of 15 DHSs from pha-4, icl-1, and ceh-13 drove reporter gene expression in transgenic C. elegans. Overall, we provide experimental annotation of 26,644 putative CRMs in the embryo containing 55,890 TF footprints, as well as 15,841 putative CRMs in the L1 arrest larvae containing 32,685 TF footprints.
Funding Information
  • NIH
  • National Institute of General Medical Sciences (GM084389)
  • Howard Hughes Medical Institute (047101)
  • National Science Foundation