Fast Exact Multiplication by the Hessian

1 January 1994

journal article
Published by MIT Press in Neural Computation

Vol. 6 (1), 147-160
https://doi.org/10.1162/neco.1994.6.1.147

Abstract

Just storing the Hessian H (the matrix of second derivatives δ²E/δw_iδw_j of the error E with respect to each pair of weights) of a large neural network is difficult. Since a common use of a large matrix like H is to compute its product with various vectors, we derive a technique that directly calculates Hv, where v is an arbitrary vector. To calculate Hv, we first define a differential operator R_v{f(w)} = (δ/δr)f(w + rv)|r=0, note that R_v{▽w} = Hv and R_v{w} = v, and then apply R_v{·} to the equations used to compute ▽_w. The result is an exact and numerically stable procedure for computing Hv, which takes about as much computation, and is about as local, as a gradient evaluation. We then apply the technique to a one pass gradient calculation algorithm (backpropagation), a relaxation gradient calculation algorithm (recurrent backpropagation), and two stochastic gradient calculation algorithms (Boltzmann machines and weight perturbation). Finally, we show that this technique can be used at the heart of many iterative techniques for computing various properties of H, obviating any need to calculate the full Hessian.

Keywords

This publication has 10 references indexed in Scilit:

SUPERVISED LEARNING ON LARGE REDUNDANT TRAINING SETS
International Journal of Neural Systems, 1993
A scaled conjugate gradient algorithm for fast supervised learning
Neural Networks, 1993
Exact Calculation of the Hessian Matrix for the Multilayer Perceptron
Neural Computation, 1992
A Practical Bayesian Framework for Backpropagation Networks
Neural Computation, 1992
Automatic Hessians by reverse accumulation
IMA Journal of Numerical Analysis, 1992
Weight Perturbation: An Optimal Architecture and Learning Technique for Analog VLSI Feedforward and Recurrent Multilayer Networks
Neural Computation, 1991
Backpropagation: past and future
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1988
Generalization of back-propagation to recurrent neural networks
Physical Review Letters, 1987
A learning algorithm for boltzmann machines
Cognitive Science, 1985
Stationary and nonstationary learning characteristics of the LMS adaptive filter
Proceedings of the IEEE, 1976

Cited by 224 articles