Generalization of Cross-Entropy Loss Function for Image Classification

Abstract
Classification task is one of the most common tasks in machine learning. This supervised learning problem consists in assigning each input to one of a finite number of discrete categories. Classification task appears naturally in numerous applications, such as medical image processing, speech recognition, maintenance systems, accident detection, autonomous driving etc.In the last decade methods of deep learning have proven to be extremely efficient in multiple machine learning problems, including classification. Whereas the neural network architecture might depend a lot on data type and restrictions posed by the nature of the problem (for example, real-time applications), the process of its training (i.e. finding model’s parameters) is almost always presented as loss function optimization problem.Cross-entropy is a loss function often used for multiclass classification problems, as it allows to achieve high accuracy results.Here we propose to use a generalized version of this loss based on Renyi divergence and entropy. We remark that in case of binary labels proposed generalization is reduced to cross-entropy, thus we work in the context of soft labels. Specifically, we consider a problem of image classification being solved by application of convolution neural networks with mixup regularizer. The latter expands the training set by taking convex combination of pairs of data samples and corresponding labels. Consequently, labels are no longer binary (corresponding to single class), but have a form of vector of probabilities. In such settings cross-entropy and proposed generalization with Renyi divergence and entropy are distinct, and their comparison makes sense.To measure effectiveness of the proposed loss function we consider image classification problem on benchmark CIFAR-10 dataset. This dataset consists of 60000 images belonging to 10 classes, where images are color and have the size of 32×32. Training set consists of 50000 images, and the test set contains 10000 images.For the convolution neural network, we follow [1] where the same classification task was studied with respect to different loss functions and consider the same neural network architecture in order to obtain comparable results.Experiments demonstrate superiority of the proposed method over cross-entropy for loss function parameter value α < 1. For parameter value α > 1 proposed method shows worse results than cross-entropy loss function. Finally, parameter value α = 1 corresponds to cross-entropy.