Exploring social annotations for information retrieval

21 April 2008

conference paper
conference paper
Published by Association for Computing Machinery (ACM)

p. 715-724
https://doi.org/10.1145/1367497.1367594

Abstract

Social annotation has gained increasing popularity in many Web-based applications, leading to an emerging research area in text analysis and information retrieval. This paper is concerned with developing probabilistic models and computational algorithms for social annotations. We propose a unified framework to combine the modeling of social annotations with the language modeling-based methods for information retrieval. The proposed approach consists of two steps: (1) discovering topics in the contents and annotations of documents while categorizing the users by domains; and (2) enhancing document and query language models by incorporating user domain interests as well as topical background models. In particular, we propose a new general generative model for social annotations, which is then simplified to a computationally tractable hierarchical Bayesian network. Then we apply smoothing techniques in a risk minimization framework to incorporate the topical information to language models. Experiments are carried out on a real-world annotation data set sampled from del.icio.us. Our results demonstrate significant improvements over the traditional approaches.

Keywords

This publication has 15 references indexed in Scilit:

Probabilistic models for discovering e-communities
Published by Association for Computing Machinery (ACM) ,2006
Usage patterns of collaborative tagging systems
Journal of Information Science, 2006
Information Retrieval in Folksonomies: Search and Ranking
Lecture Notes in Computer Science, 2006
Probabilistic author-topic models for information discovery
Published by Association for Computing Machinery (ACM) ,2004
A study of smoothing methods for language models applied to information retrieval
ACM Transactions on Information Systems, 2004
On the bursty evolution of blogspace
Published by Association for Computing Machinery (ACM) ,2003
SemTag and seeker
Published by Association for Computing Machinery (ACM) ,2003
Document language models, query models, and risk minimization for information retrieval
Published by Association for Computing Machinery (ACM) ,2001
The Semantic Web
Scientific American, 2001
A language modeling approach to information retrieval
Published by Association for Computing Machinery (ACM) ,1998

Cited by 107 articles