NEXT MUTATION PREDICTION OF SARS-COV-2 SPIKE PROTEIN SEQUENCE USING ENCODER-DECODER BASED LONG SHORT TERM MEMORY (LSTM) METHOD

Abstract
The recent world is facing a new pandemic which is caused by a virus named Coronavirus. Its fast mutation capability makes the situation worse affecting all the countries. Handling the virus is a challenging task now as there is still no permanent remedy for this. The doctors, engineers, scientists all are working together to fight against the virus. Revealing the genome sequencing and total structure of the virus paves the way for more research on this topic. Many researchers and scientists are working relentlessly on mutation analysis. Since spike proteins are one of the most important parts of SARS-CoV-2 for affecting humans, scientists are working for vaccine and drug discovery targeting S protein. Many Machine learning, Artificial Intelligence, Deep Learning methods are used on the genome datasets to detect the mutation position and predict further insights. The goal of this work is to predict the most probable next-generation Spike Protein sequence of SARS-CoV-2. We have proposed a model that uses the Encoder-Decoder based LSTM model on date-wise ordered protein sequence data of S-protein. This has worked effectively on predicting next generation sequence of S protein. We compared our model with other deep learning models i.e. CNN-LSTM and Attention-based LSTM. We also experimented our model with large datasets as well as with small datasets, and the results of the tests are effective and efficient in both ways.