Incremental multi-step Q-learning

Open Access

1 January 1996

journal article
Published by Springer Science and Business Media LLC in Machine Learning

Vol. 22 (1-3), 283-290
https://doi.org/10.1007/bf00114731

Abstract

This paper presents a novel incremental algorithm that combines Q-learning, a well-known dynamic-programming based reinforcement learning method, with the TD(λ) return estimation process, which is typically used in actor-critic learning, another well-known dynamic-programming based reinforcement learning method. The parameter λ is used to distribute credit throughout sequences of actions, leading to faster learning and also helping to alleviate the non-Markovian effect of coarse state-space quatization. The resulting algorithm.Q(λ)-learning, thus combines some of the best features of the Q-learning and actor-critic learning paradigms. The behavior of this algorithm has been demonstrated through computer simulations.

Keywords

This publication has 3 references indexed in Scilit:

Fast and Efficient Reinforcement Learning with Truncated Temporal Differences
Published by Elsevier BV ,1995
Efficient Learning and Planning Within the Dyna Framework
Adaptive Behavior, 1993
Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming
Published by Elsevier BV ,1990

Cited by 108 articles