Integral Reinforcement Learning for Continuous-Time Input-Affine Nonlinear Systems With Simultaneous Invariant Explorations

Abstract

This paper focuses on a class of reinforcement learning (RL) algorithms, named integral RL (I-RL), that solve continuous-time (CT) nonlinear optimal control problems with input-affine system dynamics. First, we extend the concepts of exploration, integral temporal difference, and invariant admissibility to the target CT nonlinear system that is governed by a control policy plus a probing signal called an exploration. Then, we show input-to-state stability (ISS) and invariant admissibility of the closed-loop systems with the policies generated by integral policy iteration (I-PI) or invariantly admissible PI (IA-PI) method. Based on these, three online I-RL algorithms named explorized I-PI and integral $Q$ -learning I, II are proposed, all of which generate the same convergent sequences as I-PI and IA-PI under the required excitation condition on the exploration. All the proposed methods are partially or completely model free, and can simultaneously explore the state space in a stable manner during the online learning processes. ISS, invariant admissibility, and convergence properties of the proposed methods are also investigated, and related with these, we show the design principles of the exploration for safe learning. Neural-network-based implementation methods for the proposed schemes are also presented in this paper. Finally, several numerical simulations are carried out to verify the effectiveness of the proposed methods.

Keywords

Funding Information

Institute of BioMed-IT, Energy-IT and Smart-IT Technology (BEST), a Brain Korea 21 plus program at Yonsei University
Basic Science Research Program through the National Research Foundation of Korea(NRF)
Ministry of Education (No. NRF-2013R1A1A2012609)

This publication has 25 references indexed in Scilit:

Online adaptive algorithm for optimal control with integral reinforcement learning
International Journal of Robust and Nonlinear Control, 2013
A novel actor–critic–identifier architecture for approximate optimal control of uncertain nonlinear systems
Automatica, 2013
Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems
Automatica, 2012
Inverse optimal neural control of a class of nonlinear systems with constrained inputs for trajectory tracking
Optimal Control Applications and Methods, 2011
Online actor–critic algorithm to solve the continuous-time infinite horizon optimal control problem
Automatica, 2010
Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems
Neural Networks, 2009
Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control
Automatica, 2007
Adaptive dynamic programming
IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews), 2002
Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation
Automatica, 1997
On an iterative technique for Riccati equation computations
IEEE Transactions on Automatic Control, 1968

Cited by 79 articles