Rough set-aided keyword reduction for text categorization
- 1 October 2001
- journal article
- research article
- Published by Informa UK Limited in Applied Artificial Intelligence
- Vol. 15 (9), 843-873
- https://doi.org/10.1080/088395101753210773
Abstract
The volume of electronically stored information increases exponentially as the state of the art progresses. Automated information filtering (IF) and information retrieval (IR) systems are therefore acquiring rapidly increasing prominence. However, such systems sacrifice efficiency to boost effectiveness. Such systems typically have to cope with sets of vectors of many tens of thousands of dimensions. Rough set (RS) theory can be applied to reducing the dimensionality of data used in IF/IR tasks, by providing a measure of the information content of datasets with respect to a given classification. This can aid IF/IR systems that rely on the acquisition of large numbers of term weights or other measures of relevance. This article investigates the applicability of RS theory to the IF/IR application domain and compares this applicability with respect to various existing TC techniques. The ability of the approach to generalize, given a minimum of training data is also addressed. The background of RS theory is presented, with an illustrative example to demonstrate the operation of the RS-based dimensionality reduction. A modular system is proposed which allows the integration of this technique with a large variety of different IF/IR approaches. The example application, categorization of E-mail messages, is described. Systematic experiments and their results are reported and analyzed.Keywords
This publication has 7 references indexed in Scilit:
- Combining rough sets and data-driven fuzzy learning for generation of classification rulesPattern Recognition, 1999
- Foundations of Neural Networks, Fuzzy Systems, and Knowledge EngineeringPublished by MIT Press ,1996
- ROUGH SET REDUCTION OF ATTRIBUTES AND THEIR DOMAINS FOR NEURAL NETWORKSComputational Intelligence, 1995
- Constructing Decision TreesPublished by Elsevier BV ,1993
- Rough SetsPublished by Springer Science and Business Media LLC ,1991
- Term-weighting approaches in automatic text retrievalInformation Processing & Management, 1988
- Rough setsInternational Journal of Parallel Programming, 1982