Prominent use of distal 5′ transcription start sites and discovery of a large number of additional exons in ENCODE regions
Open Access
- 13 June 2007
- journal article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 17 (6), 746-759
- https://doi.org/10.1101/gr.5660607
Abstract
This report presents systematic empirical annotation of transcript products from 399 annotated protein-coding loci across the 1% of the human genome targeted by the Encyclopedia of DNA elements (ENCODE) pilot project using a combination of 5′ rapid amplification of cDNA ends (RACE) and high-density resolution tiling arrays. We identified previously unannotated and often tissue- or cell-line-specific transcribed fragments (RACEfrags), both 5′ distal to the annotated 5′ terminus and internal to the annotated gene bounds for the vast majority (81.5%) of the tested genes. Half of the distal RACEfrags span large segments of genomic sequences away from the main portion of the coding transcript and often overlap with the upstream-annotated gene(s). Notably, at least 20% of the resultant novel transcripts have changes in their open reading frames (ORFs), most of them fusing ORFs of adjacent transcripts. A significant fraction of distal RACEfrags show expression levels comparable to those of known exons of the same locus, suggesting that they are not part of very minority splice forms. These results have significant implications concerning (1) our current understanding of the architecture of protein-coding genes; (2) our views on locations of regulatory regions in the genome; and (3) the interpretation of sequence polymorphisms mapping to regions hitherto considered to be “noncoding,” ultimately relating to the identification of disease-related sequence alterations.Keywords
This publication has 39 references indexed in Scilit:
- Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot projectNature, 2007
- Structured RNAs in the ENCODE selected regions of the human genomeGenome Research, 2007
- Pseudogenes in the ENCODE regions: Consensus annotation, analysis of transcription, and evolutionGenome Research, 2007
- Biological function of unannotated transcription during the early development of Drosophila melanogasterNature Genetics, 2006
- Quantitative microarray profiling provides evidence against widespread coupling of alternative splicing with nonsense-mediated mRNA decay to control gene expressionGenes & Development, 2006
- Diversification of transcriptional modulation: Large-scale identification and characterization of putative alternative promoters of human genesGenome Research, 2005
- Gene identification signature (GIS) analysis for transcriptome characterization and genome annotationNature Methods, 2005
- Finishing the euchromatic sequence of the human genomeNature, 2004
- C. elegans ORFeome version 1.1: experimental verification of the genome annotation and resource for proteome-scale protein expressionNature Genetics, 2003
- Initial sequencing and analysis of the human genomeNature, 2001