Ensemble Algorithms in Reinforcement Learning

Abstract

This paper describes several ensemble methods that combine multiple different reinforcement learning (RL) algorithms in a single agent. The aim is to enhance learning speed and final performance by combining the chosen actions or action probabilities of different RL algorithms. We designed and implemented four different ensemble methods combining the following five different RL algorithms: Q-learning, Sarsa, actor-critic (AC), QV-learning, and AC learning automaton. The intuitively designed ensemble methods, namely, majority voting (MV), rank voting, Boltzmann multiplication (BM), and Boltzmann addition, combine the policies derived from the value functions of the different RL algorithms, in contrast to previous work where ensemble methods have been used in RL for representing and learning a single value function. We show experiments on five maze problems of varying complexity; the first problem is simple, but the other four maze tasks are of a dynamic or partially observable nature. The results indicate that the BM and MV ensembles significantly outperform the single RL algorithms.

Keywords

This publication has 13 references indexed in Scilit:

Generalized maze navigation: SRN critics solve what feedforward or Hebbian nets cannot
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Multi-agent reinforcement learning: weighting and partitioning
Neural Networks, 1999
Bagging predictors
Machine Learning, 1996
Reinforcement learning of multiple tasks using a hierarchical CMAC architecture
Robotics and Autonomous Systems, 1995
Stable Function Approximation in Dynamic Programming
Published by Elsevier BV ,1995
Prioritized sweeping: Reinforcement learning with less data and less time
Machine Learning, 1993
Q-learning
Machine Learning, 1992
Adaptive Mixtures of Local Experts
Neural Computation, 1991
Learning to predict by the methods of temporal differences
Machine Learning, 1988
Learning Automata - A Survey
IEEE Transactions on Systems, Man, and Cybernetics, 1974

Cited by 103 articles