Top Cited Papers
Open Access
Abstract
A great many tools have been developed for supervised classification, ranging from early methods such as linear discriminant analysis through to modern developments such as neural networks and support vector machines. A large number of comparative studies have been conducted in attempts to establish the relative superiority of these methods. This paper argues that these comparisons often fail to take into account important aspects of real problems, so that the apparent superiority of more sophisticated methods may be something of an illusion. In particular, simple methods typically yield performance almost as good as more sophisticated methods, to the extent that the difference in performance may be swamped by other sources of uncertainty that generally are not considered in the classical supervised classification paradigm.Comment: This paper commented in: [math.ST/0606447], [math.ST/0606452], [math.ST/0606455], [math.ST/0606457]. Rejoinder in [math.ST/0606461]. Published at http://dx.doi.org/10.1214/088342306000000060 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

This publication has 33 references indexed in Scilit: