FastUniq: A Fast De Novo Duplicates Removal Tool for Paired Short Reads
Open Access
- 20 December 2012
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLOS ONE
- Vol. 7 (12), e52249
- https://doi.org/10.1371/journal.pone.0052249
Abstract
The presence of duplicates introduced by PCR amplification is a major issue in paired short reads from next-generation sequencing platforms. These duplicates might have a serious impact on research applications, such as scaffolding in whole-genome sequencing and discovering large-scale genome variations, and are usually removed. We present FastUniq as a fast de novo tool for removal of duplicates in paired short reads. FastUniq identifies duplicates by comparing sequences between read pairs and does not require complete genome sequences as prerequisites. FastUniq is capable of simultaneously handling reads with different lengths and results in highly efficient running time, which increases linearly at an average speed of 87 million reads per 10 minutes. FastUniq is freely available at http://sourceforge.net/projects/fastuniq/.Keywords
This publication has 29 references indexed in Scilit:
- Fulcrum: condensing redundant reads from high-throughput sequencing studiesBioinformatics, 2012
- Using the Acropora digitifera genome to understand coral responses to environmental changeNature, 2011
- A dominant mutation in RPE65 identified by whole-exome sequencing causes retinitis pigmentosa with choroidal involvementEuropean Journal of Human Genetics, 2011
- Quality Control Procedures for Genome‐Wide Association StudiesCurrent Protocols in Human Genetics, 2011
- High-quality draft assemblies of mammalian genomes from massively parallel sequence dataProceedings of the National Academy of Sciences of the United States of America, 2010
- Scaffolding pre-assembled contigs using SSPACEBioinformatics, 2010
- Discovery of common Asian copy number variants using integrated high-resolution array CGH and massively parallel DNA sequencingNature Genetics, 2010
- Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (G+C)-biased genomesNature Methods, 2009
- High-throughput oncogene mutation profiling in human cancerNature Genetics, 2007
- Comparison of DNA Sequences with Protein SequencesGenomics, 1997