Toward a Language Modeling Approach for Consumer Review Spam Detection
- 1 November 2010
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE) in 2010 IEEE 7th International Conference on E-Business Engineering
Abstract
Numerous reports have indicated the severity of fake reviews (i.e., spam) posted to various e-Commerce or opinion sharing Web sites. Nevertheless, very few studies have been conducted to examine the trustworthiness of online consumer reviews because of the lack of an effective computational methodology. Unlike other kinds of Web spam, untruthful reviews could just look like other legitimate reviews (i.e., ham), and so it is difficult to apply any features to distinguish the two classes. One main contribution of our research work is the development of a novel computational methodology to combat online review spam. Our experimental results confirm that the KL divergence and the probabilistic language modeling based computational model is effective for the detection of untruthful reviews. Empowered by the proposed computational methods, our empirical study found that around 2% of the consumer reviews posted to a large e-Commerce site is spam.Keywords
This publication has 22 references indexed in Scilit:
- Link spam target detection using page farmsACM Transactions on Knowledge Discovery From Data, 2009
- Partitioned logistic regression for spam filteringPublished by Association for Computing Machinery (ACM) ,2008
- Online Consumer Review: Word-of-Mouth as a New Element of Marketing Communication MixManagement Science, 2008
- Towards a belief-revision-based adaptive and context-sensitive information retrieval systemACM Transactions on Information Systems, 2008
- Detecting splogs via temporal dynamics using self-similarity analysisACM Transactions on the Web, 2008
- Promotional Chat on the InternetMarketing Science, 2006
- Recognizing contextual polarity in phrase-level sentiment analysisPublished by Association for Computational Linguistics (ACL) ,2005
- On the Function of Sales AssistanceMarketing Science, 1994
- Introduction to WordNet: An On-line Lexical Database*International Journal of Lexicography, 1990
- Estimation of probabilities in the language model of the IBM speech recognition systemIEEE Transactions on Acoustics, Speech, and Signal Processing, 1984