Sentiment analysis in multiple languages
Top Cited Papers
- 20 June 2008
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Information Systems
- Vol. 26 (3), 1-34
- https://doi.org/10.1145/1361684.1361685
Abstract
The Internet is frequently used as a medium for exchange of information and opinions, as well as propaganda dissemination. In this study the use of sentiment analysis methodologies is proposed for classification of Web forum opinions in multiple languages. The utility of stylistic and syntactic features is evaluated for sentiment classification of English and Arabic content. Specific feature extraction components are integrated to account for the linguistic characteristics of Arabic. The entropy weighted genetic algorithm (EWGA) is also developed, which is a hybridized genetic algorithm that incorporates the information-gain heuristic for feature selection. EWGA is designed to improve performance and get a better assessment of key features. The proposed features and techniques are evaluated on a benchmark movie review dataset and U.S. and Middle Eastern Web forum postings. The experimental results using EWGA with SVM indicate high performance levels, with accuracies of over 91% on the benchmark dataset as well as the U.S. and Middle Eastern forums. Stylistic features significantly enhanced performance across all testbeds while EWGA also outperformed other feature selection methods, indicating the utility of these features and techniques for document-level classification of sentiments.Keywords
This publication has 56 references indexed in Scilit:
- From fingerprint to writeprintCommunications of the ACM, 2006
- US Domestic Extremist Groups on the Web: Link and Content AnalysisIEEE Intelligent Systems, 2005
- Debating the Events of September 11th: Discursive and Interactional Dynamics in Three Online ForaJournal of Computer-Mediated Communication, 2005
- A Controlled-corpus Experiment in Authorship Identification by Cross-entropyLiterary and Linguistic Computing, 2005
- Development of Hybrid Genetic Algorithms for Product Line DesignsIEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2004
- Measuring praise and criticismACM Transactions on Information Systems, 2003
- Automatically Categorizing Written Texts by Author GenderLiterary and Linguistic Computing, 2002
- Studying Hate Crime with the Internet: What Makes Racists Advocate Racial Violence?Journal of Social Issues, 2002
- Frequency and Specificity of Referents to Violence in News Reports of Anti-gay AttacksDiscourse & Society, 2002
- A note on genetic algorithms for large-scale feature selectionPattern Recognition Letters, 1989