TagDust—a program to eliminate artifacts from next generation sequencing data

Open Access

7 September 2009

journal article
research article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 25 (21), 2839-2840
https://doi.org/10.1093/bioinformatics/btp527

Abstract

Motivation: Next-generation parallel sequencing technologies produce large quantities of short sequence reads. Due to experimental procedures various types of artifacts are commonly sequenced alongside the targeted RNA or DNA sequences. Identification of such artifacts is important during the development of novel sequencing assays and for the downstream analysis of the sequenced libraries. Results: Here we present TagDust, a program identifying artifactual sequences in large sequencing runs. Given a user-defined cutoff for the false discovery rate, TagDust identifies all reads explainable by combinations and partial matches to known sequences used during library preparation. We demonstrate the quality of our method on sequencing runs performed on Illumina's Genome Analyzer platform. Availability: Executables and documentation are available from http://genome.gsc.riken.jp/osc/english/software/. Contact:timolassmann@gmail.com

This publication has 6 references indexed in Scilit:

Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features
Nucleic Acids Research, 2009
Modeling ChIP Sequencing In Silico with Applications
PLoS Computational Biology, 2008
Next-Generation Sequencing: The Race Is On
Cell, 2008
The impact of next-generation sequencing technology on genetics
Trends in Genetics, 2008
Figaro: a novel statistical method for vector sequence removal
Bioinformatics, 2008
Approximate multiple string search
Lecture Notes in Computer Science, 1996

Cited by 211 articles