A New Q-Learning Algorithm Based on the Metropolis Criterion
- 20 September 2004
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)
- Vol. 34 (5), 2140-2143
- https://doi.org/10.1109/tsmcb.2004.832154
Abstract
The balance between exploration and exploitation is one of the key problems of action selection in Q-learning. Pure exploitation causes the agent to reach the locally optimal policies quickly, whereas excessive exploration degrades the performance of the Q-learning algorithm even if it may accelerate the learning process and allow avoiding the locally optimal policies. In this paper, finding the optimum policy in Q-learning is described as search for the optimum solution in combinatorial optimization. The Metropolis criterion of simulated annealing algorithm is introduced in order to balance exploration and exploitation of Q-learning, and the modified Q-learning algorithm based on this criterion, SA-Q-learning, is presented. Experiments show that SA-Q-learning converges more quickly than Q-learning or Boltzmann exploration, and that the search does not suffer of performance degradation due to excessive exploration.Keywords
This publication has 10 references indexed in Scilit:
- Nature's way of optimizingArtificial Intelligence, 2000
- Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learningArtificial Intelligence, 1999
- Cooperative behavior acquisition for mobile robots in dynamically changing real worlds via vision-based reinforcement learning and developmentArtificial Intelligence, 1999
- Explanation-Based Learning and Reinforcement Learning: A Unified ViewMachine Learning, 1997
- Reinforcement Learning: A SurveyJournal of Artificial Intelligence Research, 1996
- Explanation-Based Learning and Reinforcement Learning: A Unified ViewPublished by Elsevier BV ,1995
- Q-learningMachine Learning, 1992
- Learning to predict by the methods of temporal differencesMachine Learning, 1988
- Optimization by Simulated AnnealingScience, 1983
- Equation of State Calculations by Fast Computing MachinesThe Journal of Chemical Physics, 1953