Incremental multi-step Q-learning
Open Access
- 1 January 1996
- journal article
- Published by Springer Science and Business Media LLC in Machine Learning
- Vol. 22 (1-3), 283-290
- https://doi.org/10.1007/bf00114731
Abstract
This paper presents a novel incremental algorithm that combines Q-learning, a well-known dynamic-programming based reinforcement learning method, with the TD(λ) return estimation process, which is typically used in actor-critic learning, another well-known dynamic-programming based reinforcement learning method. The parameter λ is used to distribute credit throughout sequences of actions, leading to faster learning and also helping to alleviate the non-Markovian effect of coarse state-space quatization. The resulting algorithm.Q(λ)-learning, thus combines some of the best features of the Q-learning and actor-critic learning paradigms. The behavior of this algorithm has been demonstrated through computer simulations.Keywords
This publication has 3 references indexed in Scilit:
- Fast and Efficient Reinforcement Learning with Truncated Temporal DifferencesPublished by Elsevier BV ,1995
- Efficient Learning and Planning Within the Dyna FrameworkAdaptive Behavior, 1993
- Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic ProgrammingPublished by Elsevier BV ,1990