RNA-Seq improves annotation of protein-coding genes in the cucumber genome
Open Access
- 1 November 2011
- journal article
- research article
- Published by Springer Science and Business Media LLC in BMC Genomics
- Vol. 12 (1), 540
- https://doi.org/10.1186/1471-2164-12-540
Abstract
Background: As more and more genomes are sequenced, genome annotation becomes increasingly important in bridging the gap between sequence and biology. Gene prediction, which is at the center of genome annotation, usually integrates various resources to compute consensus gene structures. However, many newly sequenced genomes have limited resources for gene predictions. In an effort to create high-quality gene models of the cucumber genome (Cucumis sativus var. sativus), based on the EVidenceModeler gene prediction pipeline, we incorporated the massively parallel complementary DNA sequencing (RNA-Seq) reads of 10 cucumber tissues into EVidenceModeler. We applied the new pipeline to the reassembled cucumber genome and included a comparison between our predicted protein-coding gene sets and a published set. Results: The reassembled cucumber genome, annotated with RNA-Seq reads from 10 tissues, has 23, 248 identified protein-coding genes. Compared with the published prediction in 2009, approximately 8, 700 genes reveal structural modifications and 5, 285 genes only appear in the reassembled cucumber genome. All the related results, including genome sequence and annotations, are available at http://cmb.bnu.edu.cn/Cucumis_sativus_v20/. Conclusions: We conclude that RNA-Seq greatly improves the accuracy of prediction of protein-coding genes in the reassembled cucumber genome. The comparison between the two gene sets also suggests that it is feasible to use RNA-Seq reads to annotate newly sequenced or less-studied genomes.This publication has 52 references indexed in Scilit:
- Full-length transcriptome assembly from RNA-Seq data without a reference genomeNature Biotechnology, 2011
- Ab initio reconstruction of cell type–specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAsNature Biotechnology, 2010
- Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiationNature Biotechnology, 2010
- Angiosperm genome comparisons reveal early polyploidy in the monocot lineageProceedings of the National Academy of Sciences of the United States of America, 2009
- RNA-Seq: a revolutionary tool for transcriptomicsNature Reviews Genetics, 2009
- SOAP: short oligonucleotide alignment programBioinformatics, 2008
- Using native and syntenically mapped cDNA alignments to improve de novo gene findingBioinformatics, 2008
- The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phylaNature, 2007
- De novo identification of repeat families in large genomesBioinformatics, 2005
- BLAT—The BLAST-Like Alignment ToolGenome Research, 2002