Online policy iteration based algorithms to solve the continuous-time infinite horizon optimal control problem

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE) in 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning

p. 36-41
https://doi.org/10.1109/adprl.2009.4927523

Abstract

In this paper we discuss two online algorithms based on policy iterations for learning the continuous-time (CT) optimal control solution when nonlinear systems with infinite horizon quadratic cost are considered. For the first time we present an online adaptive algorithm implemented on an actor/critic structure which involves synchronous continuous-time adaptation of both actor and critic neural networks. This is a version of generalized policy iteration for CT systems. The convergence to the optimal controller based on the novel algorithm is proven while stability of the system is guaranteed. The characteristics and requirements of the new online learning algorithm are discussed in relation with the regular online policy iteration algorithm for CT systems which we have previously developed. The latter solves the optimal control problem by performing sequential updates on the actor and critic networks, i.e. while one is learning the other one is held constant. In contrast, the new algorithm relies on simultaneous adaptation of both actor and critic networks. To support the new theoretical result a simulation example is then considered.

Keywords

This publication has 11 references indexed in Scilit:

Adaptive optimal control for continuous-time linear systems based on policy iteration
Automatica, 2009
Adaptive optimal control algorithm for continuous-time nonlinear systems based on policy iteration
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2008
Continuous-Time Adaptive Critics
IEEE Transactions on Neural Networks, 2007
Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach
Automatica, 2005
Neural networks for control and system identification
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Adaptive dynamic programming
IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews), 2002
Reinforcement Learning in Continuous Time and Space
Neural Computation, 2000
Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation
Automatica, 1997
Adaptive critic designs
IEEE Transactions on Neural Networks, 1997
Reinforcement learning in continuous time: advantage updating
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1994

Cited by 31 articles