Goal Representation Heuristic Dynamic Programming on Maze Navigation

22 July 2013

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Neural Networks and Learning Systems

Vol. 24 (12), 2038-2050
https://doi.org/10.1109/tnnls.2013.2271454

Abstract

Goal representation heuristic dynamic programming (GrHDP) is proposed in this paper to demonstrate online learning in the Markov decision process. In addition to the (external) reinforcement signal in literature, we develop an adaptively internal goal/reward representation for the agent with the proposed goal network. Specifically, we keep the actor-critic design in heuristic dynamic programming (HDP) and include a goal network to represent the internal goal signal, to further help the value function approximation. We evaluate our proposed GrHDP algorithm on two 2-D maze navigation problems, and later on one 3-D maze navigation problem. Compared to the traditional HDP approach, the learning performance of the agent is improved with our proposed GrHDP approach. In addition, we also include the learning performance with two other reinforcement learning algorithms, namely Sarsa(λ) and Q-learning, on the same benchmarks for comparison. Furthermore, in order to demonstrate the theoretical guarantee of our proposed method, we provide the characteristics analysis toward the convergence of weights in neural networks in our GrHDP approach.

Keywords

This publication has 34 references indexed in Scilit:

A boundedness result for the direct heuristic dynamic programming
Neural Networks, 2012
Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming
Automatica, 2012
A three-network architecture for on-line learning and optimization based on adaptive dynamic programming
Neurocomputing, 2011
Intelligence in the brain: A theory of how it works and how to build it
Neural Networks, 2009
Helicopter trimming and tracking control using direct neural dynamic programming
IEEE Transactions on Neural Networks, 2003
Online learning control by association and reinforcement
IEEE Transactions on Neural Networks, 2001
TD(?) converges with probability 1
Machine Learning, 1994
The convergence of TD(?) for general ?
Machine Learning, 1992
Consistency of HDP applied to a simple reinforcement learning problem
Neural Networks, 1990
A Stochastic Approximation Method
The Annals of Mathematical Statistics, 1951

Cited by 87 articles