Toward a Language Modeling Approach for Consumer Review Spam Detection

Abstract

Numerous reports have indicated the severity of fake reviews (i.e., spam) posted to various e-Commerce or opinion sharing Web sites. Nevertheless, very few studies have been conducted to examine the trustworthiness of online consumer reviews because of the lack of an effective computational methodology. Unlike other kinds of Web spam, untruthful reviews could just look like other legitimate reviews (i.e., ham), and so it is difficult to apply any features to distinguish the two classes. One main contribution of our research work is the development of a novel computational methodology to combat online review spam. Our experimental results confirm that the KL divergence and the probabilistic language modeling based computational model is effective for the detection of untruthful reviews. Empowered by the proposed computational methods, our empirical study found that around 2% of the consumer reviews posted to a large e-Commerce site is spam.

Keywords

This publication has 22 references indexed in Scilit:

Link spam target detection using page farms
ACM Transactions on Knowledge Discovery From Data, 2009
Partitioned logistic regression for spam filtering
Published by Association for Computing Machinery (ACM) ,2008
Online Consumer Review: Word-of-Mouth as a New Element of Marketing Communication Mix
Management Science, 2008
Towards a belief-revision-based adaptive and context-sensitive information retrieval system
ACM Transactions on Information Systems, 2008
Detecting splogs via temporal dynamics using self-similarity analysis
ACM Transactions on the Web, 2008
Promotional Chat on the Internet
Marketing Science, 2006
Recognizing contextual polarity in phrase-level sentiment analysis
Published by Association for Computational Linguistics (ACL) ,2005
On the Function of Sales Assistance
Marketing Science, 1994
Introduction to WordNet: An On-line Lexical Database^*
International Journal of Lexicography, 1990
Estimation of probabilities in the language model of the IBM speech recognition system
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1984

Cited by 40 articles