Collective Spammer Detection in Evolving Multi-Relational Social Networks
- 10 August 2015
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM)
Abstract
Detecting unsolicited content and the spammers who create it is a long-standing challenge that affects all of us on a daily basis. The recent growth of richly-structured social networks has provided new challenges and opportunities in the spam detection landscape. Motivated by the Tagged.com social network, we develop methods to identify spammers in evolving multi-relational social networks. We model a social network as a time-stamped multi-relational graph where vertices represent users, and edges represent different activities between them. To identify spammer accounts, our approach makes use of structural features, sequence modelling, and collective reasoning. We leverage relational sequence information using k-gram features and probabilistic modelling with a mixture of Markov models. Furthermore, in order to perform collective reasoning and improve the predictive power of a noisy abuse reporting system, we develop a statistical relational model using hinge-loss Markov random fields (HL-MRFs), a class of probabilistic graphical models which are highly scalable. We use Graphlab Create and Probabilistic Soft Logic (PSL) to prototype and experimentally evaluate our solutions on internet-scale data from Tagged.com. Our experiments demonstrate the effectiveness of our approach, and show that models which incorporate the multi-relational nature of the social network significantly gain predictive performance over those that do not.Keywords
Funding Information
- National Science Foundation (IIS0746930)
This publication has 32 references indexed in Scilit:
- A Flexible Framework for Probabilistic Models of Social TrustLecture Notes in Computer Science, 2013
- Knowledge Graph IdentificationLecture Notes in Computer Science, 2013
- Survey on web spam detectionACM SIGKDD Explorations Newsletter, 2012
- A brief survey on sequence classificationACM SIGKDD Explorations Newsletter, 2010
- Graph regularization methods for Web spam detectionMachine Learning, 2010
- A survey of learning-based techniques of email spam filteringArtificial Intelligence Review, 2008
- Link analysis for Web spam detectionACM Transactions on the Web, 2008
- Empirical Analysis of an Evolving Social NetworkScience, 2006
- Augmenting Naive Bayes Classifiers with Statistical Language ModelsInformation Retrieval Journal, 2004
- Greedy function approximation: A gradient boosting machine.The Annals of Statistics, 2001