Linear function neurons: Structure and training

Abstract
Three different representations for a thresholded linear equation are developed. For binary input they are shown to be representationally equivalent though their training characteristics differ. A training algorithm for linear equations is discussed. The similarities between its simplest mathematical representation (perceptron training), a formal model of animal learning (Rescorla-Wagner learning), and one mechanism of neural learning (Aplysia gill withdrawal) are pointed out. For d input features, perceptron training is shown to have a lower bound of 2 d and an upper bound of d d adjusts. It is possible that the true upper bound is 4 d , though this has not been proved. Average performance is shown to have a lower bound of 1.4 d . Learning time is shown to increase linearly with the number of irrelevant or replicated features. The (X of N) function (a subset of linearly separable functions containing OR and AND) is shown to be learnable in d 3 time. A method of utilizing conditional probability to accelerate learning is proposed. This reduces the observed growth rate from 4 d to the theoretical minimum (for the unmodified version) of 2 d . A different version reduces the growth rate to about 1.7 d . The linear effect of irrelevant features can also be eliminated. Whether such an approach can be made provably convergent is not known.