A Three Word-Level Approach Used in Machine Learning for Romanian Sentiment Analysis

Abstract
In this paper, we propose a new approach to evaluate online feelings, emotions or opinions that allows text classification with some polarity and produce a solution for some relevant sentiment analysis challenges that improves the reliability of sentiment analysis performed. Thus, we propose a semi-supervised machine learning system, based on a taxonomy of emotionally charged words with three classes, taking into account the neutral polarity class, but also a comparison of results by applying several classification algorithms, such as Naïve Bayes, Decision Trees, Support Vector Machines and our proposed approach. We present aspects about natural language complexity from the sentiment perspective and some language resources, considering a Romanian corpus based on 25,841 news and a Romanian language dictionary containing 42,497 words, which can be used by sentiment analysis systems. The scope of the paper is to identify the utility and usability of multiple data sets in a real-world application, to get relevant results about online comments perception that can provide valuable future perspectives for a nation’s social and even political issues. The best result of our implemented system, after having processed the corpus, is over 82%, compared to classical machine learning classification methods with a best score of 70%.