Introduction of Fixed Mode States into Online Reinforcement Learning with Penalties and Rewards and its Application to Biped Robot Waist Trajectory Generation

Abstract

During a long-term reinforcement learning task, the efficiency of learning is heavily degraded because the probabilistic actions of an agent often cause the task to fail, which makes it difficult to reach the goal and receive a reward. To address this problem, a fixed mode state is proposed in this paper. If the agent acquires an adequate reward, a normal state is switched to a fixed mode state. In this mode, the agent selects an action using a greedy strategy, i.e., it selects the highest weight action deterministically. First, this paper combines Online Profit Sharing reinforcement learning with the Penalty Avoiding Rational Policy Making algorithm, then introduces fixed mode states in it. The target task is then formulated, i.e., learning the modified waist trajectory of dynamically stable walking task based on the static stable walking of a biped robot. Finally, we present our simulation results and discuss the effectiveness of the proposed method.

Keywords

This publication has 13 references indexed in Scilit:

Exploitation-Oriented Learning PS-r^#
Journal of Advanced Computational Intelligence and Intelligent Informatics, 2009
Acquiring a Government Bond Trading Strategy Using Reinforcement Learning
Journal of Advanced Computational Intelligence and Intelligent Informatics, 2009
A New Improved Penalty Avoiding Rational Policy Making Algorithm for Keepaway with Continuous State Spaces
Journal of Advanced Computational Intelligence and Intelligent Informatics, 2009
Instance-based Policy Learning by Real-coded Genetic Algorithms and Its Application to Control of Nonholonomic Systems
Transactions of the Japanese Society for Artificial Intelligence, 2009
Evolution Strategies for Direct Policy Search
Lecture Notes in Computer Science, 2008
Motivated reinforcement learning for adaptive characters in open-ended simulation games
Published by Association for Computing Machinery (ACM) ,2007
Reinforcement Learning for RoboCup Soccer Keepaway
Adaptive Behavior, 2005
Acrobot control by learning the switching of multiple controllers
Artificial Life and Robotics, 2005
Exemplar-Based Direct Policy Search with Evolutionary Optimization
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Reinforcement Learning: An Introduction
IEEE Transactions on Neural Networks, 1998

Cited by 6 articles