Continuous-Time Adaptive Critics
- 7 May 2007
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Neural Networks
- Vol. 18 (3), 631-647
- https://doi.org/10.1109/tnn.2006.889499
Abstract
A continuous-time formulation of an adaptive critic design (ACD) is investigated. Connections to the discrete case are made, where backpropagation through time (BPTT) and real-time recurrent learning (RTRL) are prevalent. Practical benefits are that this framework fits in well with plant descriptions given by differential equations and that any standard integration routine with adaptive step-size does an adaptive sampling for free. A second-order actor adaptation using Newton's method is established for fast actor convergence for a general plant and critic. Also, a fast critic update for concurrent actor-critic training is introduced to immediately apply necessary adjustments of critic parameters induced by actor updates to keep the Bellman optimality correct to first-order approximation after actor changes. Thus, critic and actor updates may be performed at the same time until some substantial error build up in the Bellman optimality or temporal difference equation, when a traditional critic training needs to be performed and then another interval of concurrent actor-critic training may resumeKeywords
This publication has 17 references indexed in Scilit:
- Online Adaptive Critic Flight ControlJournal of Guidance, Control, and Dynamics, 2004
- Adaptive NN control of uncertain nonlinear pure-feedback systemsAutomatica, 2002
- Neurocontroller alternatives for "fuzzy" ball-and-beam systems with nonuniform nonlinear frictionIEEE Transactions on Neural Networks, 2000
- Adaptive neural network control of nonlinear systems by state and output feedbackIEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 1999
- An analysis of temporal-difference learning with function approximationIEEE Transactions on Automatic Control, 1997
- A neighboring optimal adaptive critic for missile guidanceMathematical and Computer Modelling, 1996
- Residual Algorithms: Reinforcement Learning with Function ApproximationPublished by Elsevier BV ,1995
- The convergence of TD(?) for general ?Machine Learning, 1992
- A Learning Algorithm for Continually Running Fully Recurrent Neural NetworksNeural Computation, 1989
- Learning to predict by the methods of temporal differencesMachine Learning, 1988