Collective Spammer Detection in Evolving Multi-Relational Social Networks

10 August 2015

conference paper
conference paper
Published by Association for Computing Machinery (ACM)

https://doi.org/10.1145/2783258.2788606

Abstract

Detecting unsolicited content and the spammers who create it is a long-standing challenge that affects all of us on a daily basis. The recent growth of richly-structured social networks has provided new challenges and opportunities in the spam detection landscape. Motivated by the Tagged.com social network, we develop methods to identify spammers in evolving multi-relational social networks. We model a social network as a time-stamped multi-relational graph where vertices represent users, and edges represent different activities between them. To identify spammer accounts, our approach makes use of structural features, sequence modelling, and collective reasoning. We leverage relational sequence information using k-gram features and probabilistic modelling with a mixture of Markov models. Furthermore, in order to perform collective reasoning and improve the predictive power of a noisy abuse reporting system, we develop a statistical relational model using hinge-loss Markov random fields (HL-MRFs), a class of probabilistic graphical models which are highly scalable. We use Graphlab Create and Probabilistic Soft Logic (PSL) to prototype and experimentally evaluate our solutions on internet-scale data from Tagged.com. Our experiments demonstrate the effectiveness of our approach, and show that models which incorporate the multi-relational nature of the social network significantly gain predictive performance over those that do not.

Keywords

Funding Information

National Science Foundation (IIS0746930)

This publication has 32 references indexed in Scilit:

A Flexible Framework for Probabilistic Models of Social Trust
Lecture Notes in Computer Science, 2013
Knowledge Graph Identification
Lecture Notes in Computer Science, 2013
Survey on web spam detection
ACM SIGKDD Explorations Newsletter, 2012
A brief survey on sequence classification
ACM SIGKDD Explorations Newsletter, 2010
Graph regularization methods for Web spam detection
Machine Learning, 2010
A survey of learning-based techniques of email spam filtering
Artificial Intelligence Review, 2008
Link analysis for Web spam detection
ACM Transactions on the Web, 2008
Empirical Analysis of an Evolving Social Network
Science, 2006
Augmenting Naive Bayes Classifiers with Statistical Language Models
Information Retrieval Journal, 2004
Greedy function approximation: A gradient boosting machine.
The Annals of Statistics, 2001

Cited by 62 articles