MALVIRUS: an integrated application for viral variant analysis
Open Access
- 19 April 2022
- journal article
- research article
- Published by Springer Science and Business Media LLC in BMC Bioinformatics
- Vol. 22 (S15), 1-16
- https://doi.org/10.1186/s12859-022-04668-0
Abstract
Being able to efficiently call variants from the increasing amount of sequencing data daily produced from multiple viral strains is of the utmost importance, as demonstrated during the COVID-19 pandemic, in order to track the spread of the viral strains across the globe. We present MALVIRUS, an easy-to-install and easy-to-use application that assists users in multiple tasks required for the analysis of a viral population, such as the SARS-CoV-2. MALVIRUS allows to: (1) construct a variant catalog consisting in a set of variations (SNPs/indels) from the population sequences, (2) efficiently genotype and annotate variants of the catalog supported by a read sample, and (3) when the considered viral species is the SARS-CoV-2, assign the input sample to the most likely Pango lineages using the genotyped variations. Tests on Illumina and Nanopore samples proved the efficiency and the effectiveness of MALVIRUS in analyzing SARS-CoV-2 strain samples with respect to publicly available data provided by NCBI and the more complete dataset provided by GISAID. A comparison with state-of-the-art tools showed that MALVIRUS is always more precise and often have a better recall.Keywords
Funding Information
- Università degli Studi di Milano-Bicocca (2019-ATE-0533)
- H2020 Marie Sklodowska-Curie Actions (872539)
This publication has 29 references indexed in Scilit:
- GenBankNucleic Acids Research, 2019
- MALVA: Genotyping by Mapping-free ALlele Detection of Known VAriantsiScience, 2019
- Minimap2: pairwise alignment for nucleotide sequencesBioinformatics, 2018
- KMC 3: counting and manipulating k-mer statisticsBioinformatics, 2017
- GISAID: Global initiative on sharing all influenza data – from vision to realityEurosurveillance, 2017
- SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignmentsMicrobial Genomics, 2016
- MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and UsabilityMolecular Biology and Evolution, 2013
- LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasetsNucleic Acids Research, 2012
- A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEffFly, 2012
- The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing dataGenome Research, 2010