Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals

Top Cited Papers
Open Access
Abstract
Mammalian genomes are transcribed to produce numerous large non-coding RNAs, but their function is unclear, primarily because these transcripts show little or no evidence of evolutionary conservation. A new approach to characterizing these mysterious molecules has now moved the field on. Rather than targeting the RNA molecules themselves, their existence was revealed as chromatin modifications or epigenomic marks in the DNA of four mouse cell types. The search yielded over a thousand large multi-exonic transcriptional units that do not overlap known protein-coding loci and are highly conserved. Possible functions could be assigned to each of these large intervening non-coding RNAs (or lincRNAs), ranging from embryonic stem cell pluripotency to cell proliferation. Specific lincRNAs turn out to be regulated by transcription factors that are key in these processes including p53, NFκB, Sox2, Oct4, and Nanog — and most of these lincRNAs are conserved across mammals. This study uses chromatin marks in four mouse cell types to identify ∼1,600 large multi-exonic transcriptional units that do not overlap known protein-coding loci and are highly conserved. Putative functions are assigned to each of these large intervening non-coding RNAs, which range from ES pluripotency to cell proliferation. There is growing recognition that mammalian cells produce many thousands of large intergenic transcripts1,2,3,4. However, the functional significance of these transcripts has been particularly controversial. Although there are some well-characterized examples, most (>95%) show little evidence of evolutionary conservation and have been suggested to represent transcriptional noise5,6. Here we report a new approach to identifying large non-coding RNAs using chromatin-state maps to discover discrete transcriptional units intervening known protein-coding loci. Our approach identified ∼1,600 large multi-exonic RNAs across four mouse cell types. In sharp contrast to previous collections, these large intervening non-coding RNAs (lincRNAs) show strong purifying selection in their genomic loci, exonic sequences and promoter regions, with greater than 95% showing clear evolutionary conservation. We also developed a functional genomics approach that assigns putative functions to each lincRNA, demonstrating a diverse range of roles for lincRNAs in processes from embryonic stem cell pluripotency to cell proliferation. We obtained independent functional validation for the predictions for over 100 lincRNAs, using cell-based assays. In particular, we demonstrate that specific lincRNAs are transcriptionally regulated by key transcription factors in these processes such as p53, NFκB, Sox2, Oct4 (also known as Pou5f1) and Nanog. Together, these results define a unique collection of functional lincRNAs that are highly conserved and implicated in diverse biological processes.