An improved boosting algorithm and its application to text categorization
- 6 November 2000
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM)
Abstract
We describe AdaBoost.MH , an improved boosting al- gorithm, and its application to text categorization. Boosting is a method for supervised learning which has successfully been applied to many different domains, and that has proven one of the best performers in text categorization exercises so far. Boosting is based on the idea of relying on the collec- tive judgment of a committee of classifiers that are trained sequentially. In training the i-th classifier special emphasis is placed on the correct categorization of the training docu- ments which have proven harder for the previously trained classifiers. AdaBoost.MHKR is based on the idea to build, at every iteration of the learning phase, not a single classi- fier but a sub-committee of the K classifiers which, at that iteration, look the most promising. We report the results of systematic experimentation of this method performed on the standard Reuters-21578 benchmark. These experiments have shown that AdaBoost.MHKR is both more efficient to train and more effective than the original AdaBoost.MHR algorithmKeywords
This publication has 12 references indexed in Scilit:
- Maximizing text-mining performanceIEEE Intelligent Systems and their Applications, 1999
- Context-sensitive learning methods for text categorizationACM Transactions on Information Systems, 1999
- An Evaluation of Statistical Approaches to Text CategorizationInformation Retrieval Journal, 1999
- Boosting and Rocchio applied to text filteringPublished by Association for Computing Machinery (ACM) ,1998
- Classification of Text DocumentsThe Computer Journal, 1998
- A Decision-Theoretic Generalization of On-Line Learning and an Application to BoostingJournal of Computer and System Sciences, 1997
- Combining classifiers in text categorizationPublished by Association for Computing Machinery (ACM) ,1996
- Method combination for document filteringPublished by Association for Computing Machinery (ACM) ,1996
- Evaluating and optimizing autonomous text classification systemsPublished by Association for Computing Machinery (ACM) ,1995
- Irrelevant Features and the Subset Selection ProblemPublished by Elsevier BV ,1994