Software for pre-processing Illumina next-generation sequencing short read sequences

Open Access

Abstract

When compared to Sanger sequencing technology, next-generation sequencing (NGS) technologies are hindered by shorter sequence read length, higher base-call error rate, non-uniform coverage, and platform-specific sequencing artifacts. These characteristics lower the quality of their downstream analyses, e.g. de novo and reference-based assembly, by introducing sequencing artifacts and errors that may contribute to incorrect interpretation of data. Although many tools have been developed for quality control and pre-processing of NGS data, none of them provide flexible and comprehensive trimming options in conjunction with parallel processing to expedite pre-processing of large NGS datasets.

Keywords

This publication has 30 references indexed in Scilit:

Trimmomatic: a flexible trimmer for Illumina sequence data
Bioinformatics, 2014
GAGE: A critical evaluation of genome assemblies and assembly algorithms
Genome Research, 2011
Assemblathon 1: A competitive assessment of de novo short read assembly methods
Genome Research, 2011
Sense from sequence reads: methods for alignment and assembly
Nature Methods, 2009
Evaluation of next generation sequencing platforms for population targeted sequencing studies
Genome Biology, 2009
Sequencing and de novo analysis of a coral larval transcriptome using 454 GSFlx
BMC Genomics, 2009
Mapping and quantifying mammalian transcriptomes by RNA-Seq
Nature Methods, 2008
Velvet: Algorithms for de novo short read assembly using de Bruijn graphs
Genome Research, 2008
Genome assembly forensics: finding the elusive mis-assembly
Genome Biology, 2008
High-Resolution Profiling of Histone Methylations in the Human Genome
Cell, 2007

Cited by 208 articles