Weakly Supervised Extraction of Computer Security Events from Twitter
- 18 May 2015
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM)
- p. 896-905
- https://doi.org/10.1145/2736277.2741083
Abstract
Twitter contains a wealth of timely information, however staying on top of breaking events requires that an information analyst constantly scan many sources, leading to information overload. For example, a user might wish to be made aware whenever an infectious disease outbreak takes place, when a new smartphone is announced or when a distributed Denial of Service (DoS) attack might affect an organization's network connectivity. There are many possible event categories an analyst may wish to track, making it impossible to anticipate all those of interest in advance. We therefore propose a weakly supervised approach, in which extractors for new categories of events are easy to define and train, by specifying a small number of seed examples. We cast seed-based event extraction as a learning problem where only positive and unlabeled data is available. Rather than assuming unlabeled instances are negative, as is common in previous work, we propose a learning objective which regularizes the label distribution towards a user-provided expectation. Our approach greatly outperforms heuristic negatives, used in most previous work, in experiments on real-world data. Significant performance gains are also demonstrated over two novel and competitive baselines: semi-supervised EM and one-class support-vector machines. We investigate three security-related events breaking on Twitter: DoS attacks, data breaches and account hijacking. A demonstration of security events extracted by our system is available at: http://kb1.cse.ohio-state.edu:8123/events/hackedKeywords
Funding Information
- Department of Defense (FA8721-05-C-0003)
- DARPA (FA8750-13-2-0005)
This publication has 23 references indexed in Scilit:
- Open domain event extraction from twitterPublished by Association for Computing Machinery (ACM) ,2012
- TwiNERPublished by Association for Computing Machinery (ACM) ,2012
- Identifying content for planned events across social media sitesPublished by Association for Computing Machinery (ACM) ,2012
- Learning classifiers from only positive and unlabeled dataPublished by Association for Computing Machinery (ACM) ,2008
- Survey of network-based defense mechanisms countering the DoS and DDoS problemsACM Computing Surveys, 2007
- EspressoPublished by Association for Computational Linguistics (ACL) ,2006
- Estimating the Support of a High-Dimensional DistributionNeural Computation, 2001
- Extracting Patterns and Relations from the World Wide WebLecture Notes in Computer Science, 1999
- Message Understanding Conference-6Published by Association for Computational Linguistics (ACL) ,1996
- Automatic acquisition of hyponyms from large text corporaPublished by Association for Computational Linguistics (ACL) ,1992