Using additive noise in back-propagation training

Abstract
We discuss the possibility of improving the generalization capability of a neural network by introducing additive noise to the training samples. The network considered is a feedforward layered neural network trained with the back-propagation algorithm. Back-propagation training is viewed as nonlinear least-squares regression and the additive noise is interpreted as generating a kernel estimate of the probability density that describes the training vector distribution. Two specific application types are considered: pattern classifier networks and estimation of a nonstochastic mapping from data that are corrupted by measurement errors. We do not prove that the introduction of additive noise to the training vectors always improves network generalization. However, our analysis suggests mathematically justified rules for choosing the characteristics of noise if additive noise is used in training. Further, using results of mathematical statistics we establish various asymptotic consistency results for the proposed method. We also report numerical simulations that give support to the applicability of the suggested training method.