Reinforcement Learning for Partially Observable Dynamic Processes: Adaptive Dynamic Programming Using Measured Output Data
Top Cited Papers
- 29 March 2010
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)
- Vol. 41 (1), 14-25
- https://doi.org/10.1109/tsmcb.2010.2043839
Abstract
Approximate dynamic programming (ADP) is a class of reinforcement learning methods that have shown their importance in a variety of applications, including feedback control of dynamical systems. ADP generally requires full information about the system internal states, which is usually not available in practical situations. In this paper, we show how to implement ADP methods using only measured input/output data from the system. Linear dynamical systems with deterministic behavior are considered herein, which are systems of great interest in the control system community. In control system theory, these types of methods are referred to as output feedback (OPFB). The stochastic equivalent of the systems dealt with in this paper is a class of partially observable Markov decision processes. We develop both policy iteration and value iteration algorithms that converge to an optimal controller that requires only OPFB. It is shown that, similar to Q-learning, the new methods have the important advantage that knowledge of the system dynamics is not needed for the implementation of these learning algorithms or for the OPFB control. Only the order of the system, as well as an upper bound on its "observability index," must be known. The learned OPFB controller is in the form of a polynomial autoregressive moving-average controller that has equivalent performance with the optimal state variable feedback gain.Keywords
This publication has 29 references indexed in Scilit:
- Adaptive optimal controllers based on Generalized Policy Iteration in a continuous-time frameworkPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2009
- Adaptive Dynamic Programming: An IntroductionIEEE Computational Intelligence Magazine, 2009
- Online policy iteration based algorithms to solve the continuous-time infinite horizon optimal control problemPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2009
- Adaptive Control TutorialPublished by Society for Industrial & Applied Mathematics (SIAM) ,2006
- Handbook of Learning and Approximate Dynamic ProgrammingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2004
- Virtual reference feedback tuning for two degree of freedom controllersInternational Journal of Adaptive Control and Signal Processing, 2002
- Markov Data-Based LQG Control1Journal of Dynamic Systems, Measurement, and Control, 1998
- Model-free control of nonlinear stochastic systems with discrete-time measurementsIEEE Transactions on Automatic Control, 1998
- Iterative feedback tuning: theory and applicationsIEEE Control Systems, 1998
- An iterative technique for the computation of the steady state gains for the discrete optimal regulatorIEEE Transactions on Automatic Control, 1971