Mice and Men: Their Promoter Properties

Abstract
Using the two largest collections of Mus musculus and Homo sapiens transcription start sites (TSSs) determined based on CAGE tags, ditags, full-length cDNAs, and other transcript data, we describe the compositional landscape surrounding TSSs with the aim of gaining better insight into the properties of mammalian promoters. We classified TSSs into four types based on compositional properties of regions immediately surrounding them. These properties highlighted distinctive features in the extended core promoters that helped us delineate boundaries of the transcription initiation domain space for both species. The TSS types were analyzed for associations with initiating dinucleotides, CpG islands, TATA boxes, and an extensive collection of statistically significant cis-elements in mouse and human. We found that different TSS types show preferences for different sets of initiating dinucleotides and cis-elements. Through Gene Ontology and eVOC categories and tissue expression libraries we linked TSS characteristics to expression. Moreover, we show a link of TSS characteristics to very specific genomic organization in an example of immune-response-related genes (GO:0006955). Our results shed light on the global properties of the two transcriptomes not revealed before and therefore provide the framework for better understanding of the transcriptional mechanisms in the two species, as well as a framework for development of new and more efficient promoter- and gene-finding tools. Tens of thousands of mammalian genes are expressed in various cells at different times, controlled mainly at the promoter level through the interaction of transcription factors with cis-elements. The authors analyzed properties of a large collection of experimental mouse (Mus musculus) and human (Homo sapiens) transcription start sites (TSSs). They defined four types of TSSs based on the compositional properties of surrounding regions and showed that (a) the regions surrounding TSSs are much richer in properties than previously thought, (b) the four TSSs types are associated with distinct groups of cis-elements and initiating dinucleotides, (c) the regions upstream of TSSs are distinctly different from the downstream ones in terms of the associated cis-elements, and (d) mouse and human TSS properties relative to CpG islands (CGIs) and TATA box elements suggest species-specific adaptation. The authors linked TSS characteristics to gene expression through categories defined by the Gene Ontology and eVOC classifications and tissue expression libraries. They provided examples of the preference of immune response genes for TSS types and specific genomic organization. Their results shed light on the fine compositional properties of TSSs in mammals and could lead to better design of promoter- and gene-finding tools, better annotation of promoters by cis-elements, and better regulatory network reconstructions. These areas represent some of the focal topics of bioinformatics and genomics research that are of interest to a wide range of life scientists.