Adaptation Method of the Exploration Ratio Based on the Orientation of Equilibrium in Multi-Agent Reinforcement Learning Under Non-Stationary Environments

Abstract

In this paper, we propose a method to adapt the exploration ratio in multi-agent reinforcement learning. The adaptation of exploration ratio is important in multi-agent learning, as this is one of key parameters that affect the learning performance. In our observation, the adaptation method can adjust the exploration ratio suitably (but not optimally) according to the characteristics of environments. We investigated the evolutionarily adaptation of the exploration ratio in multi-agent learning. We conducted several experiments to adapt the exploration ratio in a simple evolutionary way, namely, mimicking advantageous exploration ratio (MAER), and confirmed that MAER always acquires relatively lower exploration ratio than the optimal value for the change ratio of the environments. In this paper, we propose a second evolutionary adaptation method, namely, win or update exploration ratio (WoUE). The results of the experiments showed that WoUE can acquire a more suitable exploration ratio than MAER, and the obtained ratio was near-optimal.

Keywords

This publication has 7 references indexed in Scilit:

Limitations of Simultaneous Multiagent Learning in Nonstationary Environments
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2013
Meta-Learning of Exploration and Exploitation Parameters with Replacing Eligibility Traces
Lecture Notes in Computer Science, 2013
Adaptive Exploration Using Stochastic Neurons
Lecture Notes in Computer Science, 2012
Multiagent learning using a variable learning rate
Artificial Intelligence, 2002
Finite-time Analysis of the Multiarmed Bandit Problem
Machine Learning, 2002
Reinforcement Learning: An Introduction
IEEE Transactions on Neural Networks, 1998
Technical Note: Q-Learning
Machine Learning, 1992