alignparse: A Python package for parsing complex features from high-throughput long-read sequencing
Open Access
- 11 December 2019
- journal article
- Published by The Open Journal in The Journal of Open Source Software
- Vol. 4 (44), 1915
- https://doi.org/10.21105/joss.01915
Abstract
Advances in sequencing technology have made it possible to generate large numbers of long, high-accuracy sequencing reads. For instance, the new PacBio Sequel platform can generate hundreds of thousands of high-quality circular consensus sequences in a single run (Hebert et al., 2018; Rhoads & Au, 2015). Good programs exist for aligning these reads for genome assembly (Chaisson & Tesler, 2012; Li, 2018). However, these long reads can also be used for other purposes, such as sequencing PCR amplicons that contain various features of interest. For instance, PacBio circular consensus sequences have been used to identify the mutations in influenza viruses in single cells (Russell et al, 2019), or to link barcodes to gene mutants in deep mutational scanning (Matreyek et al., 2018). For such applications, the alignment of the sequences to the targets may be fairly trivial, but it is not trivial to then parse specific features of interest (such as mutations, unique molecular identifiers, cell barcodes, and flanking sequences) from these alignments.Keywords
This publication has 9 references indexed in Scilit:
- GenBankNucleic Acids Research, 2020
- Single-Cell Virus Sequencing of Influenza Infections That Trigger Innate ImmunityJournal of Virology, 2019
- Multiplex assessment of protein variant abundance by massively parallel sequencingNature Genetics, 2018
- Minimap2: pairwise alignment for nucleotide sequencesBioinformatics, 2018
- A Sequel to Sanger: amplicon sequencing that scalesBMC Genomics, 2018
- PacBio Sequencing and its ApplicationsGenomics, Proteomics and Bioinformatics, 2015
- Deep mutational scanning: a new style of protein scienceNature Methods, 2014
- Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theoryBMC Bioinformatics, 2012
- Parallel, tag-directed assembly of locally derived short sequence readsNature Methods, 2010