TUT
- 29 October 2012
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM) in Proceedings of the 21st ACM international conference on Information and knowledge management - CIKM '12
- p. 972-981
- https://doi.org/10.1145/2396761.2396884
Abstract
The rapid development of online social media sites is accompanied by the generation of tremendous web contents. Web users are shifting from data consumers to data producers. As a result, topic detection and tracking without taking users' interests into account is not enough. This paper presents a statistical model that can detect interpretable trends and topics from document streams, where each trend (short for trending story) corresponds to a series of continuing events or a storyline. A topic is represented by a cluster of words frequently co-occurred. A trend can contain multiple topics and a topic can be shared by different trends. In addition, by leveraging a Recurrent Chinese Restaurant Process (RCRP), the number of trends in our model can be determined automatically without human intervention, so that our model can better generalize to unseen data. Furthermore, our proposed model incorporates user interest to fully simulate the generation process of web contents, which offers the opportunity for personalized recommendation in online social media. Experiments on three different datasets indicated that our proposed model can capture meaningful topics and trends, monitor rise and fall of detected trends, outperform baseline approach in terms of perplexity on held-out dataset, and improve the result of user participation prediction by leveraging users' interests to different trends.Keywords
This publication has 14 references indexed in Scilit:
- Trains of thoughtPublished by Association for Computing Machinery (ACM) ,2012
- Unified analysis of streaming newsPublished by Association for Computing Machinery (ACM) ,2011
- Community evolution detection in dynamic heterogeneous information networksPublished by Association for Computing Machinery (ACM) ,2010
- On-line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and TrackingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2008
- Topics over timePublished by Association for Computing Machinery (ACM) ,2006
- Dynamic topic modelsPublished by Association for Computing Machinery (ACM) ,2006
- Discovering Emerging Topics in Unlabelled Text CollectionsLecture Notes in Computer Science, 2006
- Tracking dynamics of topic trends using a finite mixture modelPublished by Association for Computing Machinery (ACM) ,2004
- Learning to cluster web search resultsPublished by Association for Computing Machinery (ACM) ,2004
- Probabilistic latent semantic indexingPublished by Association for Computing Machinery (ACM) ,1999