Novel Approach for Generating Hybrid Features Set to Effectively Identify Hate Speech
Open Access
- 1 January 2020
- journal article
- research article
- Published by IBERAMIA: Sociedad Iberoamericana de Inteligencia Artificial in INTELIGENCIA ARTIFICIAL
- Vol. 23 (66), 97-111
- https://doi.org/10.4114/intartif.vol23iss66pp97-111
Abstract
Automating hate speech or inappropriate text detection in social media and other internet platforms is gaining a lot of interest and becoming a valuable research topic for both industry and academia in recent years. It is more important for applications to identify the disruptive contents, understand sentiment analysis, identify cyber bullying, detect flames, threats, hatred towards people or particular communities or groups etc. Text classification is a very challenging task due to the nature and complexities with languages, especially its context, micro words, emojis, typo error and sarcasm present in the text. In this paper, we have proposed a model with a novel approach for generating hybrid features for an effective feature representation to classify hate speech. We have combined features learned from deep learning methods with the semantic features like word n-grams and tweets specific syntactic features to form hybrid feature sets. We have also improvised preprocessing steps to reduce the number of missing embeddings to increase the vocabulary for efficient feature learning. We have experimented with the various neural networks for feature learning and machine learning models with hybrid features for classification. Our work delivers hybrid features and appropriate preprocessing techniques for an efficient classification of the standard dataset of 16k annotated hate speech tweets. The combination of Long Short Term Memory (LSTM) trained on Random Embeddings for deep learning features extraction and Logistic Regression (LR) as a classifier with the hybrid features is found to be the best model and it outperforms the state of the art reported in the literature.Keywords
This publication has 10 references indexed in Scilit:
- Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social MediaPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2020
- Developing an online hate classifier for multiple social media platformsHuman-centric Computing and Information Sciences, 2020
- Augment to PreventPublished by Association for Computing Machinery (ACM) ,2019
- Image denoising via an improved non‐local total variation modelThe Journal of Engineering, 2018
- A Survey on Automatic Detection of Hate Speech in TextACM Computing Surveys, 2018
- Deep learning for detecting inappropriate content in textInternational Journal of Data Science and Analytics, 2017
- Deep Learning for Hate Speech Detection in TweetsPublished by Association for Computing Machinery (ACM) ,2017
- Abusive Language Detection in Online User ContentPublished by Association for Computing Machinery (ACM) ,2016
- Hate Speech Detection with Comment EmbeddingsPublished by Association for Computing Machinery (ACM) ,2015
- Detecting Offensive Language in Social Media to Protect Adolescent Online SafetyPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2012