A New Q-Learning Algorithm Based on the Metropolis Criterion

20 September 2004

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)

Vol. 34 (5), 2140-2143
https://doi.org/10.1109/tsmcb.2004.832154

Abstract

The balance between exploration and exploitation is one of the key problems of action selection in Q-learning. Pure exploitation causes the agent to reach the locally optimal policies quickly, whereas excessive exploration degrades the performance of the Q-learning algorithm even if it may accelerate the learning process and allow avoiding the locally optimal policies. In this paper, finding the optimum policy in Q-learning is described as search for the optimum solution in combinatorial optimization. The Metropolis criterion of simulated annealing algorithm is introduced in order to balance exploration and exploitation of Q-learning, and the modified Q-learning algorithm based on this criterion, SA-Q-learning, is presented. Experiments show that SA-Q-learning converges more quickly than Q-learning or Boltzmann exploration, and that the search does not suffer of performance degradation due to excessive exploration.

Keywords

This publication has 10 references indexed in Scilit:

Nature's way of optimizing
Artificial Intelligence, 2000
Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
Artificial Intelligence, 1999
Cooperative behavior acquisition for mobile robots in dynamically changing real worlds via vision-based reinforcement learning and development
Artificial Intelligence, 1999
Explanation-Based Learning and Reinforcement Learning: A Unified View
Machine Learning, 1997
Reinforcement Learning: A Survey
Journal of Artificial Intelligence Research, 1996
Explanation-Based Learning and Reinforcement Learning: A Unified View
Published by Elsevier BV ,1995
Q-learning
Machine Learning, 1992
Learning to predict by the methods of temporal differences
Machine Learning, 1988
Optimization by Simulated Annealing
Science, 1983
Equation of State Calculations by Fast Computing Machines
The Journal of Chemical Physics, 1953

Cited by 111 articles