Reinforcement Learning for Partially Observable Dynamic Processes: Adaptive Dynamic Programming Using Measured Output Data

Top Cited Papers

Abstract

Approximate dynamic programming (ADP) is a class of reinforcement learning methods that have shown their importance in a variety of applications, including feedback control of dynamical systems. ADP generally requires full information about the system internal states, which is usually not available in practical situations. In this paper, we show how to implement ADP methods using only measured input/output data from the system. Linear dynamical systems with deterministic behavior are considered herein, which are systems of great interest in the control system community. In control system theory, these types of methods are referred to as output feedback (OPFB). The stochastic equivalent of the systems dealt with in this paper is a class of partially observable Markov decision processes. We develop both policy iteration and value iteration algorithms that converge to an optimal controller that requires only OPFB. It is shown that, similar to Q-learning, the new methods have the important advantage that knowledge of the system dynamics is not needed for the implementation of these learning algorithms or for the OPFB control. Only the order of the system, as well as an upper bound on its "observability index," must be known. The learned OPFB controller is in the form of a polynomial autoregressive moving-average controller that has equivalent performance with the optimal state variable feedback gain.

Keywords

This publication has 29 references indexed in Scilit:

Adaptive optimal controllers based on Generalized Policy Iteration in a continuous-time framework
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2009
Adaptive Dynamic Programming: An Introduction
IEEE Computational Intelligence Magazine, 2009
Online policy iteration based algorithms to solve the continuous-time infinite horizon optimal control problem
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2009
Adaptive Control Tutorial
Published by Society for Industrial & Applied Mathematics (SIAM) ,2006
Handbook of Learning and Approximate Dynamic Programming
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2004
Virtual reference feedback tuning for two degree of freedom controllers
International Journal of Adaptive Control and Signal Processing, 2002
Markov Data-Based LQG Control1
Journal of Dynamic Systems, Measurement, and Control, 1998
Model-free control of nonlinear stochastic systems with discrete-time measurements
IEEE Transactions on Automatic Control, 1998
Iterative feedback tuning: theory and applications
IEEE Control Systems, 1998
An iterative technique for the computation of the steady state gains for the discrete optimal regulator
IEEE Transactions on Automatic Control, 1971

Cited by 377 articles