Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof

Top Cited Papers

Abstract

Convergence of the value-iteration-based heuristic dynamic programming (HDP) algorithm is proven in the case of general nonlinear systems. That is, it is shown that HDP converges to the optimal control and the optimal value function that solves the Hamilton-Jacobi-Bellman equation appearing in infinite-horizon discrete-time (DT) nonlinear optimal control. It is assumed that, at each iteration, the value and action update equations can be exactly solved. The following two standard neural networks (NN) are used: a critic NN is used to approximate the value function, whereas an action network is used to approximate the optimal control policy. It is stressed that this approach allows the implementation of HDP without knowing the internal dynamics of the system. The exact solution assumption holds for some classes of nonlinear systems and, specifically, in the specific case of the DT linear quadratic regulator (LQR), where the action is linear and the value quadratic in the states and NNs have zero approximation error. It is stressed that, for the LQR, HDP may be implemented without knowing the system A matrix by using two NNs. This fact is not generally appreciated in the folklore of HDP for the DT LQR, where only one critic NN is generally used.

Keywords

This publication has 23 references indexed in Scilit:

Adaptive Critic Designs for Discrete-Time Zero-Sum Games With Application to $H_{\infty}$ Control
IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2007
Reinforcement Learning-Based Output Feedback Control of Nonlinear Systems With Input Constraints
IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2005
Neural networks for control and system identification
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Introduction to the special issue on neural network feedback control
Automatica, 2001
Online learning control by association and reinforcement
IEEE Transactions on Neural Networks, 2001
An algorithm to solve the discrete HJI equation arising in the L2 gain optimization problem
International Journal of Control, 1999
Adaptive critic designs
IEEE Transactions on Neural Networks, 1997
Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks
Neural Networks, 1990
Punish/Reward: Learning with a Critic in Adaptive Threshold Systems
IEEE Transactions on Systems, Man, and Cybernetics, 1973
An iterative technique for the computation of the steady state gains for the discrete optimal regulator
IEEE Transactions on Automatic Control, 1971

Cited by 829 articles