AUTHORSHIP ATTRIBUTION BASED ON FEATURE SET SUBSPACING ENSEMBLES
- 1 October 2006
- journal article
- Published by World Scientific Pub Co Pte Ltd in International Journal on Artificial Intelligence Tools
- Vol. 15 (5), 823-838
- https://doi.org/10.1142/s0218213006002965
Abstract
Authorship attribution can assist the criminal investigation procedure as well as cybercrime analysis. This task can be viewed as a single-label multi-class text categorization problem. Given that the style of a text can be represented as mere word frequencies selected in a language-independent method, suitable machine learning techniques able to deal with high dimensional feature spaces and sparse data can be directly applied to solve this problem. This paper focuses on classifier ensembles based on feature set subspacing. It is shown that an effective ensemble can be constructed using, exhaustive disjoint subspacing, a simple method producing many poor but diverse base classifiers. The simple model can be enhanced by a variation of the technique of cross-validated committees applied to the feature set. Experiments on two benchmark text corpora demonstrate the effectiveness of the presented method improving previously reported results and compare it to support vector machines, an alternative suitable machine learning approach to authorship attribution.Keywords
This publication has 20 references indexed in Scilit:
- Applying Authorship Analysis to Extremist-Group Web Forum MessagesIEEE Intelligent Systems, 2005
- Machine learning in automated text categorizationACM Computing Surveys, 2002
- Mining e-mail content for author identification forensicsACM SIGMOD Record, 2001
- Inter-Textual Distance and Authorship Attribution Corneille and MolièreJournal of Quantitative Linguistics, 2001
- Automatic Text Categorization in Terms of Genre and AuthorComputational Linguistics, 2000
- Combining multiple classifiers by averaging or by multiplying?Pattern Recognition, 2000
- The Evolution of Stylometry in Humanities ScholarshipLiterary and Linguistic Computing, 1998
- Wrappers for feature subset selectionArtificial Intelligence, 1997
- Outside the cave of shadows: using syntactic annotation to enhance authorship attributionLiterary and Linguistic Computing, 1996
- The Authorship of Greek ProseJournal of the Royal Statistical Society. Series A (General), 1965