COVID-19 Tweets Classification Based on a Hybrid Word Embedding Method

Open Access

18 May 2022

journal article
research article
Published by MDPI AG in Big Data and Cognitive Computing

Vol. 6 (2), 58
https://doi.org/10.3390/bdcc6020058

Abstract

In March 2020, the World Health Organisation declared that COVID-19 was a new pandemic. This deadly virus spread and affected many countries in the world. During the outbreak, social media platforms such as Twitter contributed valuable and massive amounts of data to better assess health-related decision making. Therefore, we propose that users’ sentiments could be analysed with the application of effective supervised machine learning approaches to predict disease prevalence and provide early warnings. The collected tweets were prepared for preprocessing and categorised into: negative, positive, and neutral. In the second phase, different features were extracted from the posts by applying several widely used techniques, such as TF-IDF, Word2Vec, Glove, and FastText to capture features’ datasets. The novelty of this study is based on hybrid features extraction, where we combined syntactic features (TF-IDF) with semantic features (FastText and Glove) to represent posts accurately, which helps in improving the classification process. Experimental results show that FastText combined with TF-IDF performed better with SVM than the other models. SVM outperformed the other models by 88.72%, as well as for XGBoost, with an 85.29% accuracy score. This study shows that the hybrid methods proved their capability of extracting features from the tweets and increasing the performance of classification.

Keywords

This publication has 38 references indexed in Scilit:

Machine learning models and cost-sensitive decision trees for bond rating prediction
Journal of the Operational Research Society, 2019
Sentiment Classification on Twitter Data Using Support Vector Machine
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2018
Probabilistic modeling and visualization for bankruptcy prediction
Applied Soft Computing, 2017
XGBoost
Published by Association for Computing Machinery (ACM) ,2016
An Overview of Sentiment Analysis in Social Media and Its Applications in Disaster Relief
Published by Springer Science and Business Media LLC ,2016
Sentiment Analysis: An Overview from Linguistics
Annual Review of Linguistics, 2016
Convolutional Neural Networks for Sentence Classification
Published by Association for Computational Linguistics (ACL) ,2014
Preprocessing unbalanced data using support vector machine
Decision Support Systems, 2012
The Porter stemming algorithm: then and now
Program: electronic library and information systems, 2006
The Regression Analysis of Binary Sequences
Journal of the Royal Statistical Society: Series B (Methodological), 1958

Cited by 13 articles