A Survey and Comparative Study of Tweet Sentiment Analysis via Semi-Supervised Learning
- 29 June 2016
- journal article
- research article
- Published by Association for Computing Machinery (ACM) in ACM Computing Surveys
- Vol. 49 (1), 1-26
- https://doi.org/10.1145/2932708
Abstract
Twitter is a microblogging platform in which users can post status messages, called “tweets,” to their friends. It has provided an enormous dataset of the so-called sentiments, whose classification can take place through supervised learning. To build supervised learning models, classification algorithms require a set of representative labeled data. However, labeled data are usually difficult and expensive to obtain, which motivates the interest in semi-supervised learning. This type of learning uses unlabeled data to complement the information provided by the labeled data in the training process; therefore, it is particularly useful in applications including tweet sentiment analysis, where a huge quantity of unlabeled data is accessible. Semi-supervised learning for tweet sentiment analysis, although appealing, is relatively new. We provide a comprehensive survey of semi-supervised approaches applied to tweet classification. Such approaches consist of graph-based, wrapper-based, and topic-based methods. A comparative study of algorithms based on self-training, co-training, topic modeling, and distant supervision highlights their biases and sheds light on aspects that the practitioner should consider in real-world applications.Keywords
Funding Information
- CNPq (Proc. 303348/2013-5)
- Brazilian Research Agencies CAPES (Proc. DS-7253238/D)
- FAPESP (Proc. 2013/07375-0 and 2010/20830-0)
This publication has 76 references indexed in Scilit:
- Techniques and applications for sentiment analysisCommunications of the ACM, 2013
- Subjectivity and sentiment analysis: An overview of the current state of the area and envisaged developmentsDecision Support Systems, 2012
- Survey on mining subjective data on the webData Mining and Knowledge Discovery, 2011
- Lexicon-Based Methods for Sentiment AnalysisComputational Linguistics, 2011
- Sentiment strength detection in short informal textJournal of the American Society for Information Science and Technology, 2010
- DASA: Dissatisfaction-oriented Advertising based on Sentiment AnalysisExpert Systems with Applications, 2010
- A discriminative model for semi-supervised learningJournal of the ACM, 2010
- Machine learning in automated text categorizationACM Computing Surveys, 2002
- Birds of a Feather: Homophily in Social NetworksAnnual Review of Sociology, 2001
- Probability of error of some adaptive pattern-recognition machinesIEEE Transactions on Information Theory, 1965