Detecting deceptive reviews using lexical and syntactic features
- 1 December 2013
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
Deceptive opinion classification has attracted a lot of research interest due to the rapid growth of social media users. Despite the availability of a vast number of opinion features and classification techniques, review classification still remains a challenging task. In this work we applied stylometric features, i.e. lexical and syntactic, using supervised machine learning classifiers, i.e. Support Vector Machine (SVM) with Sequential Minimal Optimization (SMO) and Naive Bayes, to detect deceptive opinion. Detecting deceptive opinion by a human reader is a difficult task because spammers try to write wise reviews, therefore it causes changes in writing style and verbal usage. Hence, considering the stylometric features help to distinguish the spammer writing style to find deceptive reviews. Experiments on an existing hotel review corpus suggest that using stylometric features is a promising approach for detecting deceptive opinions.Keywords
This publication has 10 references indexed in Scilit:
- Review spam detection via temporal pattern discoveryPublished by Association for Computing Machinery (ACM) ,2012
- Detecting authorship deception: a supervised machine learning approach using author writeprintsLiterary and Linguistic Computing, 2012
- Detecting spam comments with malicious users' behavioral characteristicsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2010
- Mining writeprints from anonymous e-mails for forensic investigationDigital Investigation, 2010
- Opinion spam and analysisPublished by Association for Computing Machinery (ACM) ,2008
- Analyzing and Detecting Review SpamPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2007
- A framework for authorship identification of online messages: Writing‐style features and classification techniquesJournal of the American Society for Information Science and Technology, 2005
- Computer-Based Authorship Attribution Without Lexical MeasuresComputers and the Humanities, 2001
- How Variable May a Constant be? Measures of Lexical Richness in PerspectiveComputers and the Humanities, 1998
- Outside the cave of shadows: using syntactic annotation to enhance authorship attributionLiterary and Linguistic Computing, 1996