Batch reinforcement learning in a complex domain

14 May 2007

conference paper
conference paper
Published by Association for Computing Machinery (ACM)

https://doi.org/10.1145/1329125.1329241

Abstract

Temporal difference reinforcement learning algorithms are perfectly suited to autonomous agents because they learn directly from an agent's experience based on sequential actions in the environment. However, their most common algorithmic variants are relatively inefficient in their use of experience data, which in many agent-based settings can be scarce. In particular, they make just one learning "update" for each atomic experience. Batch reinforcement learning algorithms, on the other hand, aim to achieve greater data efficiency by saving experience data and using it in aggregate to make updates to the learned policy. Their success has been demonstrated in the past on simple domains like grid worlds and low-dimensional control applications like pole balancing. In this paper, we compare and contrast batch reinforcement learning algorithms with on-line algorithms based on their empirical performance in a complex, continuous, noisy, multiagent domain, namely RoboCup soccer Keepaway. We find that the two batch methods we consider, Experience Replay and Fitted Q Iteration, both yield significant gains in sample complexity, while achieving high asymptotic performance.

Keywords

Funding Information

Division of Information and Intelligent Systems (IIS-0237699)
Defense Advanced Research Projects Agency (HR0011-04-1-0035)
National Science Foundation (EIA-0303609)

This publication has 7 references indexed in Scilit:

Comparing evolutionary and temporal difference methods in a reinforcement learning domain
Published by Association for Computing Machinery (ACM) ,2006
Reinforcement Learning for RoboCup Soccer Keepaway
Adaptive Behavior, 2005
Behavior transfer for value-function-based reinforcement learning
Published by Association for Computing Machinery (ACM) ,2005
Neural Fitted Q Iteration – First Experiences with a Data Efficient Neural Reinforcement Learning Method
Lecture Notes in Computer Science, 2005
Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
Artificial Intelligence, 1999
Self-improving reactive agents based on reinforcement learning, planning and teaching
Machine Learning, 1992
Q-learning
Machine Learning, 1992

Cited by 29 articles