Sentiment classification for Chinese reviews based on key substring features

1 September 2009

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

Abstract

One of the most widely-studied sub-problems of opinion mining is sentiment classification, which classifies evaluative texts as positive or negative to help people automatically identify the viewpoints underlying the online user-generated information. Most of the existing methods for sentiment classification ignore word sequence and unlabeled test documents' structural information. This paper proposes a transductive learning based algorithm considering both of these two types of information. The proposed algorithm is implemented by firstly selecting key substrings in the suffix tree constructed from the strings in training and unlabeled test documents and then converting each original text document to a bag of numbers of the key substrings. Finally, SVM is employed to classify the converted documents. Experiments on the open dataset (16,000 Chinese reviews) demonstrate promising performance of the proposed algorithm, the accuracy being over 93.15%, which is much better than the performance of the existing sentiment classification methods, such as n-gram features based classification algorithms. Experimental results also show that ldquotfidf-crdquo performs much better than other term weighting approaches in sentiment classification for large text corpus. In particular, the reasons behind the proposed algorithm's outstanding performance are further studied and analyzed in this paper. Moreover, the proposed algorithm can avoid the messy and rather artificial problem of defining word boundaries in Chinese language.

Keywords

This publication has 12 references indexed in Scilit:

An empirical study of sentiment analysis for chinese documents
Expert Systems with Applications, 2008
A Comparative Study of Methods for Transductive Transfer Learning
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2007
Extracting key-substring-group features for text classification
Published by Association for Computing Machinery (ACM) ,2006
Beyond the point cloud
Published by Association for Computing Machinery (ACM) ,2005
A sentimental education
Published by Association for Computational Linguistics (ACL) ,2004
Machine learning in automated text categorization
ACM Computing Surveys, 2002
Thumbs up?
Published by Association for Computational Linguistics (ACL) ,2002
An overview of statistical learning theory
IEEE Transactions on Neural Networks, 1999
Algorithms on Strings, Trees and Sequences
Published by Cambridge University Press (CUP) ,1997
On-line construction of suffix trees
Algorithmica, 1995

Cited by 5 articles