Fast Exact Multiplication by the Hessian
- 1 January 1994
- journal article
- Published by MIT Press in Neural Computation
- Vol. 6 (1), 147-160
- https://doi.org/10.1162/neco.1994.6.1.147
Abstract
Just storing the Hessian H (the matrix of second derivatives δ2E/δwiδwj of the error E with respect to each pair of weights) of a large neural network is difficult. Since a common use of a large matrix like H is to compute its product with various vectors, we derive a technique that directly calculates Hv, where v is an arbitrary vector. To calculate Hv, we first define a differential operator Rv{f(w)} = (δ/δr)f(w + rv)|r=0, note that Rv{▽w} = Hv and Rv{w} = v, and then apply Rv{·} to the equations used to compute ▽w. The result is an exact and numerically stable procedure for computing Hv, which takes about as much computation, and is about as local, as a gradient evaluation. We then apply the technique to a one pass gradient calculation algorithm (backpropagation), a relaxation gradient calculation algorithm (recurrent backpropagation), and two stochastic gradient calculation algorithms (Boltzmann machines and weight perturbation). Finally, we show that this technique can be used at the heart of many iterative techniques for computing various properties of H, obviating any need to calculate the full Hessian.Keywords
This publication has 10 references indexed in Scilit:
- SUPERVISED LEARNING ON LARGE REDUNDANT TRAINING SETSInternational Journal of Neural Systems, 1993
- A scaled conjugate gradient algorithm for fast supervised learningNeural Networks, 1993
- Exact Calculation of the Hessian Matrix for the Multilayer PerceptronNeural Computation, 1992
- A Practical Bayesian Framework for Backpropagation NetworksNeural Computation, 1992
- Automatic Hessians by reverse accumulationIMA Journal of Numerical Analysis, 1992
- Weight Perturbation: An Optimal Architecture and Learning Technique for Analog VLSI Feedforward and Recurrent Multilayer NetworksNeural Computation, 1991
- Backpropagation: past and futurePublished by Institute of Electrical and Electronics Engineers (IEEE) ,1988
- Generalization of back-propagation to recurrent neural networksPhysical Review Letters, 1987
- A learning algorithm for boltzmann machinesCognitive Science, 1985
- Stationary and nonstationary learning characteristics of the LMS adaptive filterProceedings of the IEEE, 1976