Refined experts

19 July 2009

conference paper
conference paper
Published by Association for Computing Machinery (ACM)

p. 11-18
https://doi.org/10.1145/1571941.1571946

Abstract

While large-scale taxonomies--especially for web pages--have been in existence for some time, approaches to automatically classify documents into these taxonomies have met with limited success compared to the more general progress made in text classification. We argue that this stems from three causes: increasing sparsity of training data at deeper nodes in the taxonomy, error propagation where a mistake made high in the hierarchy cannot be recovered, and increasingly complex decision surfaces in higher nodes in the hierarchy. While prior research has focused on the first problem, we introduce methods that target the latter two problems--first by biasing the training distribution to reduce error propagation and second by propagating up "first-guess" expert information in a bottom-up manner before making a refined top down choice. Finally, we present an empirical study demonstrating that the suggested changes lead to 10--30% improvements in F1 scores versus an accepted competitive baseline, hierarchical SVMs.

Keywords

This publication has 16 references indexed in Scilit:

Pachinko allocation
Published by Association for Computing Machinery (ACM) ,2006
Hierarchical classification
Published by Association for Computing Machinery (ACM) ,2006
Improved Lower Bounds for Learning Intersections of Halfspaces
Lecture Notes in Computer Science, 2006
Support vector machines classification with a very large-scale taxonomy
ACM SIGKDD Explorations Newsletter, 2005
The Combination of Text Classifiers Using Reliability Indicators
Information Retrieval Journal, 2005
Large margin hierarchical classification
Published by Association for Computing Machinery (ACM) ,2004
Optimizing search by showing results in context
Published by Association for Computing Machinery (ACM) ,2001
Hierarchical classification of Web content
Published by Association for Computing Machinery (ACM) ,2000
Hierarchical neural networks for text categorization (poster abstract)
Published by Association for Computing Machinery (ACM) ,1999
Hierarchical Mixtures of Experts and the EM Algorithm
Neural Computation, 1994

Cited by 75 articles