Systematic identification of long noncoding RNAs expressed during zebrafish embryogenesis

Top Cited Papers
Open Access
Abstract
Long noncoding RNAs (IncRNAs) comprise a diverse class of transcripts that structurally resemble mRNAs but do not encode proteins. Recent genome-wide studies in humans and the mouse have annotated IncRNAs expressed in cell lines and adult tissues, but a systematic analysis of IncRNAs expressed during vertebrate embryogenesis has been elusive. To identify IncRNAs with potential functions in vertebrate embryogenesis, we performed a time-series of RNA-seq experiments at eight stages during early zebrafish development. We reconstructed 56,535 high-confidence transcripts in 28,912 loci, recovering the vast majority of expressed RefSeq transcripts while identifying thousands of novel isoforms and expressed loci. We defined a stringent set of 1133 noncoding multi-exonic transcripts expressed during embryogenesis. These include long intergenic ncRNAs (lincRNAs), intronic overlapping IncRNAs, exonic antisense overlapping IncRNAs, and precursors for small RNAs (sRNAs). Zebrafish IncRNAs share many of the characteristics of their mammalian counterparts: relatively short length, low exon number, low expression, and conservation levels comparable to that of introns. Subsets of IncRNAs carry chromatin signatures characteristic of genes with developmental functions. The temporal expression profile of IncRNAs revealed two novel properties: IncRNAs are expressed in narrower time windows than are protein-coding genes and are specifically enriched in early-stage embryos. In addition, several IncRNAs show tissue-specific expression and distinct subcellular localization patterns. Integrative computational analyses associated individual IncRNAs with specific pathways and functions, ranging from cell cycle regulation to morphogenesis. Our study provides the first systematic identification of IncRNAs in a vertebrate embryo and forms the foundation for future genetic, genomic, and evolutionary studies.