Are Loss Functions All the Same?

1 May 2004

journal article
Published by MIT Press in Neural Computation

Vol. 16 (5), 1063-1076
https://doi.org/10.1162/089976604773135104

Abstract

In this letter, we investigate the impact of choosing different loss functions from the viewpoint of statistical learning theory. We introduce a convexity assumption, which is met by all loss functions commonly used in the literature, and study how the bound on the estimation error changes with the loss. We also derive a general result on the minimizer of the expected risk for a convex loss function in the case of classification. The main outcome of our analysis is that for classification, the hinge loss appears to be the loss of choice. Other things being equal, the hinge loss leads to a convergence rate practically indistinguishable from the logistic loss rate and much better than the square loss rate. Furthermore, if the hypothesis space is sufficiently rich, the bounds obtained for the hinge loss are not loosened by the thresholding stage.

Keywords

This publication has 7 references indexed in Scilit:

Best Choices for Regularization Parameters in Learning Theory: On the Bias—Variance Problem
Mathematische Annalen, 2002
The covering number in learning theory
Journal of Complexity, 2002
Statistical Properties and Adaptive Tuning of Support Vector Machines
Machine Learning, 2002
On the mathematical foundations of learning
Bulletin of the American Mathematical Society, 2001
Regularization Networks and Support Vector Machines
Advances in Computational Mathematics, 2000
Regularization Theory and Neural Networks Architectures
Neural Computation, 1995
Theory of Reproducing Kernels
Transactions of the American Mathematical Society, 1950

Cited by 282 articles