fastp: an ultra-fast all-in-one FASTQ preprocessor
Top Cited Papers
Open Access
- 1 September 2018
- journal article
- conference paper
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 34 (17), i884-i890
- https://doi.org/10.1093/bioinformatics/bty560
Abstract
Quality control and preprocessing of FASTQ files are essential to providing clean data for downstream analysis. Traditionally, a different tool is used for each operation, such as quality control, adapter trimming and quality filtering. These tools are often insufficiently fast as most are developed using high-level programming languages (e.g. Python and Java) and provide limited multi-threading support. Reading and loading data multiple times also renders preprocessing slow and I/O inefficient. We developed fastp as an ultra-fast FASTQ preprocessor with useful quality control and data-filtering features. It can perform quality control, adapter trimming, quality filtering, per-read quality pruning and many other operations with a single scan of the FASTQ data. This tool is developed in C++ and has multi-threading support. Based on our evaluation, fastp is 2–5 times faster than other FASTQ preprocessing tools such as Trimmomatic or Cutadapt despite performing far more operations than similar tools. The open-source code and corresponding instructions are available at https://github.com/OpenGene/fastp.Funding Information
- Special Funds for Future Industries of Shenzhen (JSGG20160229123927512)
- National Science Foundation of China (61472411)
This publication has 15 references indexed in Scilit:
- SpeedSeq: ultra-fast personal genome analysis and interpretationNature Methods, 2015
- Noninvasive Prenatal Testing and Incidental Detection of Occult Maternal MalignanciesJAMA, 2015
- Detecting ultralow-frequency mutations by Duplex SequencingNature Protocols, 2014
- Trimmomatic: a flexible trimmer for Illumina sequence dataBioinformatics, 2014
- Fast gapped-read alignment with Bowtie 2Nature Methods, 2012
- Cutadapt removes adapter sequences from high-throughput sequencing readsEMBnet.Journal, 2011
- Measuring dementia carers' unmet need for services - an exploratory mixed method studyBMC Health Services Research, 2010
- The Sequence Alignment/Map format and SAMtoolsBioinformatics, 2009
- Fast and accurate short read alignment with Burrows–Wheeler transformBioinformatics, 2009
- PAnnBuilder: an R package for assembling proteomic annotation dataBioinformatics, 2009