COVID-19 Tweets Classification Based on a Hybrid Word Embedding Method
Open Access
- 18 May 2022
- journal article
- research article
- Published by MDPI AG in Big Data and Cognitive Computing
- Vol. 6 (2), 58
- https://doi.org/10.3390/bdcc6020058
Abstract
In March 2020, the World Health Organisation declared that COVID-19 was a new pandemic. This deadly virus spread and affected many countries in the world. During the outbreak, social media platforms such as Twitter contributed valuable and massive amounts of data to better assess health-related decision making. Therefore, we propose that users’ sentiments could be analysed with the application of effective supervised machine learning approaches to predict disease prevalence and provide early warnings. The collected tweets were prepared for preprocessing and categorised into: negative, positive, and neutral. In the second phase, different features were extracted from the posts by applying several widely used techniques, such as TF-IDF, Word2Vec, Glove, and FastText to capture features’ datasets. The novelty of this study is based on hybrid features extraction, where we combined syntactic features (TF-IDF) with semantic features (FastText and Glove) to represent posts accurately, which helps in improving the classification process. Experimental results show that FastText combined with TF-IDF performed better with SVM than the other models. SVM outperformed the other models by 88.72%, as well as for XGBoost, with an 85.29% accuracy score. This study shows that the hybrid methods proved their capability of extracting features from the tweets and increasing the performance of classification.Keywords
This publication has 38 references indexed in Scilit:
- Machine learning models and cost-sensitive decision trees for bond rating predictionJournal of the Operational Research Society, 2019
- Sentiment Classification on Twitter Data Using Support Vector MachinePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2018
- Probabilistic modeling and visualization for bankruptcy predictionApplied Soft Computing, 2017
- XGBoostPublished by Association for Computing Machinery (ACM) ,2016
- An Overview of Sentiment Analysis in Social Media and Its Applications in Disaster ReliefPublished by Springer Science and Business Media LLC ,2016
- Sentiment Analysis: An Overview from LinguisticsAnnual Review of Linguistics, 2016
- Convolutional Neural Networks for Sentence ClassificationPublished by Association for Computational Linguistics (ACL) ,2014
- Preprocessing unbalanced data using support vector machineDecision Support Systems, 2012
- The Porter stemming algorithm: then and nowProgram: electronic library and information systems, 2006
- The Regression Analysis of Binary SequencesJournal of the Royal Statistical Society: Series B (Methodological), 1958