Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof
Top Cited Papers
- 27 June 2008
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)
- Vol. 38 (4), 943-949
- https://doi.org/10.1109/tsmcb.2008.926614
Abstract
Convergence of the value-iteration-based heuristic dynamic programming (HDP) algorithm is proven in the case of general nonlinear systems. That is, it is shown that HDP converges to the optimal control and the optimal value function that solves the Hamilton-Jacobi-Bellman equation appearing in infinite-horizon discrete-time (DT) nonlinear optimal control. It is assumed that, at each iteration, the value and action update equations can be exactly solved. The following two standard neural networks (NN) are used: a critic NN is used to approximate the value function, whereas an action network is used to approximate the optimal control policy. It is stressed that this approach allows the implementation of HDP without knowing the internal dynamics of the system. The exact solution assumption holds for some classes of nonlinear systems and, specifically, in the specific case of the DT linear quadratic regulator (LQR), where the action is linear and the value quadratic in the states and NNs have zero approximation error. It is stressed that, for the LQR, HDP may be implemented without knowing the system A matrix by using two NNs. This fact is not generally appreciated in the folklore of HDP for the DT LQR, where only one critic NN is generally used.Keywords
This publication has 23 references indexed in Scilit:
- Adaptive Critic Designs for Discrete-Time Zero-Sum Games With Application to $H_{\infty}$ ControlIEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2007
- Reinforcement Learning-Based Output Feedback Control of Nonlinear Systems With Input ConstraintsIEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2005
- Neural networks for control and system identificationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Introduction to the special issue on neural network feedback controlAutomatica, 2001
- Online learning control by association and reinforcementIEEE Transactions on Neural Networks, 2001
- An algorithm to solve the discrete HJI equation arising in the L2 gain optimization problemInternational Journal of Control, 1999
- Adaptive critic designsIEEE Transactions on Neural Networks, 1997
- Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networksNeural Networks, 1990
- Punish/Reward: Learning with a Critic in Adaptive Threshold SystemsIEEE Transactions on Systems, Man, and Cybernetics, 1973
- An iterative technique for the computation of the steady state gains for the discrete optimal regulatorIEEE Transactions on Automatic Control, 1971