Distant conserved sequences flanking endothelial-specific promoters contain tissue-specific DNase-hypersensitive sites and over-represented motifs

Abstract
The transcriptional regulation of genes is a complex process, particularly for genes exhibiting a tissue-specific pattern of expression. We studied 28 genes that are expressed primarily in endothelial cells, another 28 genes that are expressed highly, but not exclusively, in cultured endothelial cells, and three control sets, consisting of genes not expressed in endothelium, genes expressed in neural tissues and housekeeping genes. For each gene, we identified conserved non-coding sequences (CNSs) of lengths 50 to >1000 nucleotides, located within the upstream intergenic region (from 500 to as far as 200 000 nucleotides upstream from the transcription start) or within the first intron. As a functional test, we assayed the CNSs from the set of endothelial cell-specific genes (EC-CNSs) for DNase hypersensitivity. Among 262 distant EC-CNSs, 33% are hypersensitive (HS) in endothelial cells, whereas only 16% are HS in control fibroblasts. A search for short sequence patterns revealed a number of motifs which are over-represented in EC-CNSs relative to CNSs from the control gene sets. In particular, the motif SAGGAAR is strongly and consistently over-represented among EC-CNSs, and is more over-represented in HS CNSs than in non-HS CNSs. CNSs which contain this motif are no closer to the promoter than an average CNS. This motif contains the core element of binding sites from the Ets family of transcription factors. Thus, one or several factors from this family may play a key role in the regulation of endothelial gene expression.