A Survey and Comparative Study of Tweet Sentiment Analysis via Semi-Supervised Learning

29 June 2016

journal article
research article
Published by Association for Computing Machinery (ACM) in ACM Computing Surveys

Vol. 49 (1), 1-26
https://doi.org/10.1145/2932708

Abstract

Twitter is a microblogging platform in which users can post status messages, called “tweets,” to their friends. It has provided an enormous dataset of the so-called sentiments, whose classification can take place through supervised learning. To build supervised learning models, classification algorithms require a set of representative labeled data. However, labeled data are usually difficult and expensive to obtain, which motivates the interest in semi-supervised learning. This type of learning uses unlabeled data to complement the information provided by the labeled data in the training process; therefore, it is particularly useful in applications including tweet sentiment analysis, where a huge quantity of unlabeled data is accessible. Semi-supervised learning for tweet sentiment analysis, although appealing, is relatively new. We provide a comprehensive survey of semi-supervised approaches applied to tweet classification. Such approaches consist of graph-based, wrapper-based, and topic-based methods. A comparative study of algorithms based on self-training, co-training, topic modeling, and distant supervision highlights their biases and sheds light on aspects that the practitioner should consider in real-world applications.

Keywords

Funding Information

CNPq (Proc. 303348/2013-5)
Brazilian Research Agencies CAPES (Proc. DS-7253238/D)
FAPESP (Proc. 2013/07375-0 and 2010/20830-0)

This publication has 76 references indexed in Scilit:

Techniques and applications for sentiment analysis
Communications of the ACM, 2013
Subjectivity and sentiment analysis: An overview of the current state of the area and envisaged developments
Decision Support Systems, 2012
Survey on mining subjective data on the web
Data Mining and Knowledge Discovery, 2011
Lexicon-Based Methods for Sentiment Analysis
Computational Linguistics, 2011
Sentiment strength detection in short informal text
Journal of the American Society for Information Science and Technology, 2010
DASA: Dissatisfaction-oriented Advertising based on Sentiment Analysis
Expert Systems with Applications, 2010
A discriminative model for semi-supervised learning
Journal of the ACM, 2010
Machine learning in automated text categorization
ACM Computing Surveys, 2002
Birds of a Feather: Homophily in Social Networks
Annual Review of Sociology, 2001
Probability of error of some adaptive pattern-recognition machines
IEEE Transactions on Information Theory, 1965

Cited by 68 articles