Progressive domain adaptation for detecting hate speech on social media with small training set and its application to COVID-19 concerned posts

29 July 2021

journal article
research article
Published by Springer Science and Business Media LLC in Social Network Analysis and Mining

Vol. 11 (1), 1-18
https://doi.org/10.1007/s13278-021-00780-w

Abstract

In this world of information and experience era, microblogging sites have been commonly used to express people feelings including fear, panic, hate and abuse. Monitoring and control of abuse on social media, especially during pandemics such as COVID-19, can help in keeping the public sentiment and morale positive. Developing the fear and hate detection methods based on machine learning requires labelled data. However, obtaining the labelled data in suddenly changed circumstances as a pandemic is expensive and acquiring them in a short time is impractical. Related labelled hate data from other domains or previous incidents may be available. However, the predictive accuracy of these hate detection models decreases significantly if the data distribution of the target domain, where the prediction will be applied, is different. To address this problem, we propose a novel concept of unsupervised progressive domain adaptation based on a deep-learning language model generated through multiple text datasets. We showcase the efficacy of the proposed method in hate speech and fear detection on the tweets collection during COVID-19 where the labelled information is unavailable.

Keywords

This publication has 45 references indexed in Scilit:

Domain Adaptive Neural Networks for Object Recognition
Lecture Notes in Computer Science, 2014
Toward a greater understanding of the emotional dynamics of the mortality salience manipulation: Revisiting the “affect-free” claim of terror management research.
Journal of Personality and Social Psychology, 2014
Law enforcement agency adoption and use of Twitter as a crisis communication tool
Published by Elsevier BV ,2012
Improving a statistical language model through non-linear prediction
Neurocomputing, 2009
Improving predictive inference under covariate shift by weighting the log-likelihood function
Journal of Statistical Planning and Inference, 2000
Independence, Invariance and the Causal Markov Condition
The British Journal for the Philosophy of Science, 1999
Support vector machines
IEEE Intelligent Systems and their Applications, 1998
Naive (Bayes) at forty: The independence assumption in information retrieval
Lecture Notes in Computer Science, 1998
Long Short-Term Memory
Neural Computation, 1997
Ridge Regression: Applications to Nonorthogonal Problems
Technometrics, 1970

Cited by 12 articles