Are Loss Functions All the Same?
- 1 May 2004
- journal article
- Published by MIT Press in Neural Computation
- Vol. 16 (5), 1063-1076
- https://doi.org/10.1162/089976604773135104
Abstract
In this letter, we investigate the impact of choosing different loss functions from the viewpoint of statistical learning theory. We introduce a convexity assumption, which is met by all loss functions commonly used in the literature, and study how the bound on the estimation error changes with the loss. We also derive a general result on the minimizer of the expected risk for a convex loss function in the case of classification. The main outcome of our analysis is that for classification, the hinge loss appears to be the loss of choice. Other things being equal, the hinge loss leads to a convergence rate practically indistinguishable from the logistic loss rate and much better than the square loss rate. Furthermore, if the hypothesis space is sufficiently rich, the bounds obtained for the hinge loss are not loosened by the thresholding stage.Keywords
This publication has 7 references indexed in Scilit:
- Best Choices for Regularization Parameters in Learning Theory: On the Bias—Variance ProblemMathematische Annalen, 2002
- The covering number in learning theoryJournal of Complexity, 2002
- Statistical Properties and Adaptive Tuning of Support Vector MachinesMachine Learning, 2002
- On the mathematical foundations of learningBulletin of the American Mathematical Society, 2001
- Regularization Networks and Support Vector MachinesAdvances in Computational Mathematics, 2000
- Regularization Theory and Neural Networks ArchitecturesNeural Computation, 1995
- Theory of Reproducing KernelsTransactions of the American Mathematical Society, 1950