A k-mer based metaheuristic approach for detecting COVID-19 variants
Open Access
- 23 March 2023
- journal article
- Published by Dicle Universitesi Muhendislik Fakultesi Muhendislik Dergisi in DÜMF Mühendislik Dergisi
- Vol. 14 (1), 17-26
- https://doi.org/10.24012/dumf.1195600
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) belongs to coronaviridae family and a change in the genetic sequence of SARS-CoV-2 is named as a mutation that causes to variants of SARS-CoV-2. In this paper, we propose a novel and efficient method to predict SARS-CoV-2 variants of concern from whole human genome sequences. In this method, we describe 16 dinucleotide and 64 trinucleotide features to differentiate SARS-CoV-2 variants of concern. The efficacy of the proposed features is proved by using four classifiers, k-nearest neighbor, support vector machines, multilayer perceptron, and random forest. The proposed method is evaluated on the dataset including 223,326 complete human genome sequences including recently designated variants of concern, Alpha, Beta, Gamma, Delta, and Omicron variants. Experimental results present that overall accuracy for detecting SARS-CoV-2 variants of concern remarkably increases when trinucleotide features rather than dinucleotide features are used. Furthermore, we use the whale optimization algorithm, which is a state-of-the-art method for reducing the number of features and choosing the most relevant features. We select 44 trinucleotide features out of 64 to differentiate SARS-CoV-2 variants with acceptable accuracy as a result of the whale optimization method. Experimental results indicate that the SVM classifier with selected features achieves about 99% accuracy, sensitivity, specificity, precision on average. The proposed method presents an admirable performance for detecting SARS-CoV-2 variants.Keywords
This publication has 31 references indexed in Scilit:
- GISAID: Global initiative on sharing all influenza data – from vision to realityEurosurveillance, 2017
- Efficient kNN classification algorithm for big dataNeurocomputing, 2016
- An experimental comparison of classification algorithms for imbalanced credit scoring data setsExpert Systems with Applications, 2012
- A systematic analysis of performance measures for classification tasksInformation Processing & Management, 2009
- Asymptotic Behaviors of Support Vector Machines with Gaussian KernelNeural Computation, 2003
- A comparison of methods for multiclass support vector machinesIEEE Transactions on Neural Networks, 2002
- Random ForestsMachine Learning, 2001
- A Tutorial on Support Vector Machines for Pattern RecognitionData Mining and Knowledge Discovery, 1998
- The Nature of Statistical Learning TheoryPublished by Springer Science and Business Media LLC ,1995
- Multilayer feedforward networks are universal approximatorsNeural Networks, 1989