Continuous-Time Adaptive Critics

7 May 2007

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Neural Networks

Vol. 18 (3), 631-647
https://doi.org/10.1109/tnn.2006.889499

Abstract

A continuous-time formulation of an adaptive critic design (ACD) is investigated. Connections to the discrete case are made, where backpropagation through time (BPTT) and real-time recurrent learning (RTRL) are prevalent. Practical benefits are that this framework fits in well with plant descriptions given by differential equations and that any standard integration routine with adaptive step-size does an adaptive sampling for free. A second-order actor adaptation using Newton's method is established for fast actor convergence for a general plant and critic. Also, a fast critic update for concurrent actor-critic training is introduced to immediately apply necessary adjustments of critic parameters induced by actor updates to keep the Bellman optimality correct to first-order approximation after actor changes. Thus, critic and actor updates may be performed at the same time until some substantial error build up in the Bellman optimality or temporal difference equation, when a traditional critic training needs to be performed and then another interval of concurrent actor-critic training may resume

Keywords

This publication has 17 references indexed in Scilit:

Online Adaptive Critic Flight Control
Journal of Guidance, Control, and Dynamics, 2004
Adaptive NN control of uncertain nonlinear pure-feedback systems
Automatica, 2002
Neurocontroller alternatives for "fuzzy" ball-and-beam systems with nonuniform nonlinear friction
IEEE Transactions on Neural Networks, 2000
Adaptive neural network control of nonlinear systems by state and output feedback
IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 1999
An analysis of temporal-difference learning with function approximation
IEEE Transactions on Automatic Control, 1997
A neighboring optimal adaptive critic for missile guidance
Mathematical and Computer Modelling, 1996
Residual Algorithms: Reinforcement Learning with Function Approximation
Published by Elsevier BV ,1995
The convergence of TD(?) for general ?
Machine Learning, 1992
A Learning Algorithm for Continually Running Fully Recurrent Neural Networks
Neural Computation, 1989
Learning to predict by the methods of temporal differences
Machine Learning, 1988

Cited by 94 articles