Robust Mutation Profiling of SARS-CoV-2 Variants from Multiple Raw Illumina Sequencing Data with Cloud Workflow
Open Access
- 13 April 2022
- Vol. 13 (4), 686
- https://doi.org/10.3390/genes13040686
Abstract
Several variants of the novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) are emerging all over the world. Variant surveillance from genome sequencing has become crucial to determine if mutations in these variants are rendering the virus more infectious, potent, or resistant to existing vaccines and therapeutics. Meanwhile, analyzing many raw sequencing data repeatedly with currently available code-based bioinformatics tools is tremendously challenging to be implemented in this unprecedented pandemic time due to the fact of limited experts and computational resources. Therefore, in order to hasten variant surveillance efforts, we developed an installation-free cloud workflow for robust mutation profiling of SARS-CoV-2 variants from multiple Illumina sequencing data. Herein, 55 raw sequencing data representing four early SARS-CoV-2 variants of concern (Alpha, Beta, Gamma, and Delta) from an open-access database were used to test our workflow performance. As a result, our workflow could automatically identify mutated sites of the variants along with reliable annotation of the protein-coding genes at cost-effective and timely manner for all by harnessing parallel cloud computing in one execution under resource-limitation settings. In addition, our workflow can also generate a consensus genome sequence which can be shared with others in public data repositories to support global variant surveillance efforts.Funding Information
- Ministry of Science and Technology (MOST 109-2221-E-038-016)
- Taipei Medical University Hospital (W0303, 109TMUH-SP-02)
- National Institutes of Health (HHSN261201400008C)
This publication has 51 references indexed in Scilit:
- The Cancer Genomics Cloud: Collaborative, Reproducible, and Democratized—A New Paradigm in Large-Scale Computational ResearchCancer Research, 2017
- GISAID: Global initiative on sharing all influenza data – from vision to realityEurosurveillance, 2017
- The Ensembl Variant Effect PredictorGenome Biology, 2016
- Coming of age: ten years of next-generation sequencing technologiesNature Reviews Genetics, 2016
- biobambam: tools for read pair collation based algorithms on BAM filesSource Code for Biology and Medicine, 2014
- Snakemake—a scalable bioinformatics workflow engineBioinformatics, 2012
- BioProject and BioSample databases at NCBI: facilitating capture and organization of metadataNucleic Acids Research, 2011
- The variant call format and VCFtoolsBioinformatics, 2011
- The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing dataGenome Research, 2010
- Measuring dementia carers' unmet need for services - an exploratory mixed method studyBMC Health Services Research, 2010