Decision tree algorithm for multi-label hate speech and abusive language detection in Indonesian Twitter
Open Access
- 8 August 2021
- journal article
- Published by Institute of Research and Community Services Diponegoro University (LPPM UNDIP) in Jurnal Teknologi dan Sistem Komputer
- Vol. 9 (4), 199-204
- https://doi.org/10.14710/jtsiskom.2021.13907
Abstract
Hate speech and abusive language are easily found in written communications in social media like Twitter. They often cause a dispute between parties, the victims, and the first who write the tweet. However, it is also difficult to distinguish whether a tweet contains hate speech and/or abusive language for those who take sides. This research aims to develop a method to classify the tweets into abusive and/or contain hate speech classes. If hate speech is detected, then the system will measure the hardness level of hatred. The dataset includes 13,126 real tweets data. Word embeddings are used for featuring text input. For the tweets classification, we use a Decision Tree algorithm. Some engineering of features and parameters tuning has improved the classification of the three classes: hate speech class, abusive words, and hate speech level. The lexicon feature in the Decision Tree classification produces the highest accuracy for detecting the three classes rather than engineering special features and textual features. The average accuracy of the three classes increased from 69.77 % to 70.48 % for the training-testing composition of 90:10, and another 69.35 % to 69.54 % for 80:20 respectively.Keywords
Funding Information
- UIN Sultan Syarif Kasim Riau
This publication has 4 references indexed in Scilit:
- PERBANDINGAN KINERJA WORD EMBEDDING WORD2VEC, GLOVE, DAN FASTTEXT PADA KLASIFIKASI TEKSJurnal Tekno Kompak, 2020
- Klasifikasi Ujaran Kebencian pada Cuitan dalam Bahasa IndonesiaJurnal Buana Informatika, 2019
- Multi-label Hate Speech and Abusive Language Detection in Indonesian TwitterPublished by Association for Computational Linguistics (ACL) ,2019
- Self-Training Naive Bayes Berbasis Word2Vec untuk Kategorisasi Berita Bahasa IndonesiaJurnal Nasional Teknik Elektro dan Teknologi Informasi (JNTETI), 2018