iAssembler: a package for de novo assembly of Roche-454/Sanger transcriptome sequences
Open Access
- 23 November 2011
- journal article
- Published by Springer Science and Business Media LLC in BMC Bioinformatics
- Vol. 12 (1), 453
- https://doi.org/10.1186/1471-2105-12-453
Abstract
Expressed Sequence Tags (ESTs) have played significant roles in gene discovery and gene functional analysis, especially for non-model organisms. For organisms with no full genome sequences available, ESTs are normally assembled into longer consensus sequences for further downstream analysis. However current de novo EST assembly programs often generate large number of assembly errors that will negatively affect the downstream analysis. In order to generate more accurate consensus sequences from ESTs, tools are needed to reduce or eliminate errors from de novo assemblies. We present iAssembler, a pipeline that can assemble large-scale ESTs into consensus sequences with significantly higher accuracy than current existing assemblers. iAssembler employs MIRA and CAP3 assemblers to generate initial assemblies, followed by identifying and correcting two common types of transcriptome assembly errors: 1) ESTs from different transcripts (mainly alternatively spliced transcripts or paralogs) are incorrectly assembled into same contigs; and 2) ESTs from same transcripts fail to be assembled together. iAssembler can be used to assemble ESTs generated using the traditional Sanger method and/or the Roche-454 massive parallel pyrosequencing technology. We compared performances of iAssembler and several other de novo EST assembly programs using both Roche-454 and Sanger EST datasets. It demonstrated that iAssembler generated significantly more accurate consensus sequences than other assembly programs.Keywords
This publication has 20 references indexed in Scilit:
- Integrative genomics viewerNature Biotechnology, 2011
- Comparing de novo assemblers for 454 transcriptome dataBMC Genomics, 2010
- Transcriptome sequencing and comparative analysis of cucumber flowers with different sex typesBMC Genomics, 2010
- De novo sequencing and analysis of the American ginseng root transcriptome using a GS FLX Titanium platform to discover putative genes involved in ginsenoside biosynthesisBMC Genomics, 2010
- Combining next-generation pyrosequencing with microarray for large scale expression analysis in non-model speciesBMC Genomics, 2009
- Comparative 454 pyrosequencing of transcripts from two olive genotypes during fruit developmentBMC Genomics, 2009
- TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasetsBioinformatics, 2003
- DNA sequence quality trimming and vector removalBioinformatics, 2001
- The TIGR Gene Indices: analysis of gene transcript sequences in highly sampled eukaryotic speciesNucleic Acids Research, 2001
- A Greedy Algorithm for Aligning DNA SequencesJournal of Computational Biology, 2000