CADUE: Content-Agnostic Detection of Unwanted Emails for Enterprise Security

Abstract

End-to-end email encryption (E2EE) ensures that an email could only be decrypted and read by its intended recipients. E2EE’s strong security guarantee is particularly desirable for the enterprises in the event of breaches: even if attackers break into an email server, under E2EE no contents of emails are leaked. Meanwhile, E2EE brings significant challenges for an enterprise to detect and filter unwanted emails (spams and phishing emails). Most existing solutions rely heavily on email contents (i.e., email body and attachments), which would be difficult when email contents are encrypted. In this paper, we investigate how to detect unwanted emails in a content-agnostic manner, that is, without access to the contents of emails at all. Our key observation is that the communication patterns and relationships among internal users of an enterprise contain rich and reliable information about benign email communications. Combining such information with other metadata of emails (headers and subjects when available), unwanted emails can be accurately distinguished from legitimate ones without access to email contents. Specifically, we propose two types of novel enterprise features from enterprise email logs: sender profiling features, which capture the patterns of past emails from external senders to internal recipients; and enterprise graph features, which capture the co-recipient and the sender-recipient relationships between internal users. We design a classifier utilizing the above features along with existing meta-data features. We run extensive experiments over a real-world enterprise email dataset, and show that our approach, even without any content-based features, achieves high true positive rate of 95.2% and low false positive rate of 0.3% with such stringent constraints.

Keywords

This publication has 10 references indexed in Scilit:

Pattern Matching on Encrypted Streams
Published by Springer Science and Business Media LLC ,2018
AdaGraph: Adaptive Graph-Based Algorithms for Spam Detection in Social Networks
Lecture Notes in Computer Science, 2017
That Ain’t You: Blocking Spearphishing Through Behavioral Modelling
Published by Springer Science and Business Media LLC ,2015
Semantic Feature Selection for Text with Application to Phishing Email Detection
Lecture Notes in Computer Science, 2014
An Expanded Feature Extraction of E-Mail Header for Spam Recognition
Advanced Materials Research, 2013
Maps of random walks on complex networks reveal community structure
Proceedings of the National Academy of Sciences of the United States of America, 2008
Filtering spam with behavioral blacklisting
Published by Association for Computing Machinery (ACM) ,2007
Using header session messages to anti-spamming
Computers & Security, 2007
Computing Communities in Large Networks Using Random Walks
Lecture Notes in Computer Science, 2005
Secure Conjunctive Keyword Search over Encrypted Data
Lecture Notes in Computer Science, 2004

Cited by 2 articles