Distributional clustering of words for text classification
- 1 August 1998
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM)
Abstract
This paper describes the application of Dis- tributional Clustering (20) to document classification. This approach clusters words into groups based on the distribution of class labels associated with each word. Thus, unlike some other unsupervised dimensionality- reduction techniques, such as Latent Semantic Indexing, we are able to compress the feature space much more aggressively, while still maintaining high document clas- sification accuracy. Experimental results obtained on three real-world data sets show that we can reduce the feature dimen- sional&y by three orders of magnitude and lose only 2% accuracy-significantly better than Latent Semantic In- dexing (6), class-based clustering (l), feature selection by mutual information (23), or Markov-blanket-based fea- ture selection (13). We also show that less aggressive clustering sometimes results in improved classification accuracy over classification without clustering.Keywords
This publication has 9 references indexed in Scilit:
- Elements of Information TheoryPublished by Wiley ,2001
- Threading electronic mail: A preliminary studyInformation Processing & Management, 1997
- On Bias, Variance, 0/1—Loss, and the Curse-of-DimensionalityData Mining and Knowledge Discovery, 1997
- On the Optimality of the Simple Bayesian Classifier under Zero-One LossMachine Learning, 1997
- Noise reduction in a statistical approach to text categorizationPublished by Association for Computing Machinery (ACM) ,1995
- Similarity-based estimation of word cooccurrence probabilitiesPublished by Association for Computational Linguistics (ACL) ,1994
- Distributional clustering of English wordsPublished by Association for Computational Linguistics (ACL) ,1993
- Indexing by latent semantic analysisJournal of the American Society for Information Science, 1990
- Nearest neighbor pattern classificationIEEE Transactions on Information Theory, 1967