Spam Comments Detection on Instagram Using Machine Learning and Deep Learning Methods

Abstract
The more popular a public figure on Instagram (IG), the number of followers also increase. When a public figure posts something, there are many comments from other users. In fact, from all the comments, not all of them are relevant to the post, such as advertising, links, or clickbait comments. The type of comments that are irrelevant to the post is usually called spam comments.  Spam comments will interfere with information flow and may lead to misleading information.  This research compares machine learning (ML) and deep learning (DL) classification methods based on our collected Indonesian IG spam comment dataset. This research was conducted in the following steps: dataset preparation, pre-processing, simple normalization, features generation using TF-IDF and word embedding, application of ML and DL classification methods, performance evaluation, and comparison. The authors compare accuracy, F-1, precision, and recall from ML and DL results. This research shows that ML and DL methods do not significantly differ. The Linear SVM, Extreme Tree (ET), Regression, and Stochastics Gradient Descent algorithms can reach the accuracy of 0.93. At the same time, the DL method has the highest accuracy of 0.94 using the SimpleTransformer BERT architecture.  The difference between ML and DL methods is not significantly different.