SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation
Top Cited Papers
Open Access
- 5 October 2016
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLOS ONE
- Vol. 11 (10), e0163962
- https://doi.org/10.1371/journal.pone.0163962
Abstract
FASTA and FASTQ are basic and ubiquitous formats for storing nucleotide and protein sequences. Common manipulations of FASTA/Q file include converting, searching, filtering, deduplication, splitting, shuffling, and sampling. Existing tools only implement some of these manipulations, and not particularly efficiently, and some are only available for certain operating systems. Furthermore, the complicated installation process of required packages and running environments can render these programs less user friendly. This paper describes a cross-platform ultrafast comprehensive toolkit for FASTA/Q processing. SeqKit provides executable binary files for all major operating systems, including Windows, Linux, and Mac OSX, and can be directly used without any dependencies or pre-configurations. SeqKit demonstrates competitive performance in execution time and memory usage compared to similar tools. The efficiency and usability of SeqKit enable researchers to rapidly accomplish common FASTA/Q file manipulations. SeqKit is open source and available on Github at https://github.com/shenwei356/seqkit.Keywords
Funding Information
- National Natural Science Foundation of China (81373133)
- National Natural Science Foundation of China (31570173)
This publication has 6 references indexed in Scilit:
- bíogo: a simple high-performance bioinformatics toolkit for the Go languageThe Journal of Open Source Software, 2017
- A novel algorithm for detecting multiple covariance and clustering of biological sequencesScientific Reports, 2016
- BEDTools: The Swiss‐Army Tool for Genome Feature AnalysisCurrent Protocols in Bioinformatics, 2014
- The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variantsNucleic Acids Research, 2009
- The Sequence Alignment/Map format and SAMtoolsBioinformatics, 2009
- Rapid and Sensitive Protein Similarity SearchesScience, 1985