Hierarchical Classification of Real Life Documents

Abstract
1 Introduction Two features have successfully made on-line information comprehensible and accessible to people: hierarchically structured classes where topics are organized into a hierarchy of increasing specificity, and multi-classed documents where a document is classified into all relevant classes. One such information source is Yahoo! where a document on Dance, for example, could be reached from both Arts: Performing_Arts and Recreation topics in the topic hierarchy. The hierarchical feature of classes allows information to be examined and browsed at various topic specificities, and the multi-class feature allows information to be accessed from all related topics. However, most document classification techniques assume that there is a flat class space and each document has one class. The documents classified by such techniques are difficult to browse and access by people, especially when there are many classes such as in Yahoo!. In this paper, we propose a new technique for automatic classification of documents to address these real life requirements. This raises several research issues. We use Yahoo! for explanation.