Multidocument summarization
- 1 April 2004
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Information Systems
- Vol. 22 (2), 215-241
- https://doi.org/10.1145/984321.984323
Abstract
A more and more generalized problem in effective information access is the presence in the same corpus of multiple documents that contain similar information. Generally, users may be interested in locating, for a topic addressed by a group of similar documents, one or several particular aspects. This kind of task, called instance or aspectual retrieval, has been explored in several TREC Interactive Tracks. In this article, we propose in addition to the classification capacity of clustering techniques, the possibility of offering a indicative extract about the contents of several sources by means of multidocument summarization techniques. Two kinds of summaries are provided. The first one covers the similarities of each cluster of documents retrieved. The second one shows the particularities of each document with respect to the common topic in the cluster. The document multitopic structure has been used in order to determine similarities and differences of topics in the cluster of documents. The system is independent of document domain and genre. An evaluation of the proposed system with users proves significant improvements in effectiveness. The results of previous experiments that have compared clustering algorithms are also reported.Keywords
This publication has 15 references indexed in Scilit:
- From e-sex to e-commerce: Web search changesComputer, 2002
- Using clustering and classification approaches in interactive retrievalInformation Processing & Management, 2001
- Centroid-based summarization of multiple documentsPublished by Association for Computational Linguistics (ACL) ,2000
- Using and Evaluating User Directed Summaries to Improve Information AccessLecture Notes in Computer Science, 1999
- Automatic text structuring and summarizationInformation Processing & Management, 1997
- Automatic text decomposition and structuringInformation Processing & Management, 1996
- Constructing literature abstracts by computer: Techniques and prospectsInformation Processing & Management, 1990
- Term-weighting approaches in automatic text retrievalInformation Processing & Management, 1988
- The use of hierarchic clustering in information retrievalInformation Storage and Retrieval, 1971
- New Methods in Automatic ExtractingJournal of the ACM, 1969