Detecting deceptive reviews using lexical and syntactic features

1 December 2013

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 53-58
https://doi.org/10.1109/isda.2013.6920707

Abstract

Deceptive opinion classification has attracted a lot of research interest due to the rapid growth of social media users. Despite the availability of a vast number of opinion features and classification techniques, review classification still remains a challenging task. In this work we applied stylometric features, i.e. lexical and syntactic, using supervised machine learning classifiers, i.e. Support Vector Machine (SVM) with Sequential Minimal Optimization (SMO) and Naive Bayes, to detect deceptive opinion. Detecting deceptive opinion by a human reader is a difficult task because spammers try to write wise reviews, therefore it causes changes in writing style and verbal usage. Hence, considering the stylometric features help to distinguish the spammer writing style to find deceptive reviews. Experiments on an existing hotel review corpus suggest that using stylometric features is a promising approach for detecting deceptive opinions.

Keywords

This publication has 10 references indexed in Scilit:

Review spam detection via temporal pattern discovery
Published by Association for Computing Machinery (ACM) ,2012
Detecting authorship deception: a supervised machine learning approach using author writeprints
Literary and Linguistic Computing, 2012
Detecting spam comments with malicious users' behavioral characteristics
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2010
Mining writeprints from anonymous e-mails for forensic investigation
Digital Investigation, 2010
Opinion spam and analysis
Published by Association for Computing Machinery (ACM) ,2008
Analyzing and Detecting Review Spam
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2007
A framework for authorship identification of online messages: Writing‐style features and classification techniques
Journal of the American Society for Information Science and Technology, 2005
Computer-Based Authorship Attribution Without Lexical Measures
Computers and the Humanities, 2001
How Variable May a Constant be? Measures of Lexical Richness in Perspective
Computers and the Humanities, 1998
Outside the cave of shadows: using syntactic annotation to enhance authorship attribution
Literary and Linguistic Computing, 1996

Cited by 75 articles