Introduction of Fixed Mode States into Online Reinforcement Learning with Penalties and Rewards and its Application to Biped Robot Waist Trajectory Generation
- 20 September 2012
- journal article
- Published by Fuji Technology Press Ltd. in Journal of Advanced Computational Intelligence and Intelligent Informatics
- Vol. 16 (6), 758-768
- https://doi.org/10.20965/jaciii.2012.p0758
Abstract
During a long-term reinforcement learning task, the efficiency of learning is heavily degraded because the probabilistic actions of an agent often cause the task to fail, which makes it difficult to reach the goal and receive a reward. To address this problem, a fixed mode state is proposed in this paper. If the agent acquires an adequate reward, a normal state is switched to a fixed mode state. In this mode, the agent selects an action using a greedy strategy, i.e., it selects the highest weight action deterministically. First, this paper combines Online Profit Sharing reinforcement learning with the Penalty Avoiding Rational Policy Making algorithm, then introduces fixed mode states in it. The target task is then formulated, i.e., learning the modified waist trajectory of dynamically stable walking task based on the static stable walking of a biped robot. Finally, we present our simulation results and discuss the effectiveness of the proposed method.Keywords
This publication has 13 references indexed in Scilit:
- Exploitation-Oriented Learning PS-r#Journal of Advanced Computational Intelligence and Intelligent Informatics, 2009
- Acquiring a Government Bond Trading Strategy Using Reinforcement LearningJournal of Advanced Computational Intelligence and Intelligent Informatics, 2009
- A New Improved Penalty Avoiding Rational Policy Making Algorithm for Keepaway with Continuous State SpacesJournal of Advanced Computational Intelligence and Intelligent Informatics, 2009
- Instance-based Policy Learning by Real-coded Genetic Algorithms and Its Application to Control of Nonholonomic SystemsTransactions of the Japanese Society for Artificial Intelligence, 2009
- Evolution Strategies for Direct Policy SearchLecture Notes in Computer Science, 2008
- Motivated reinforcement learning for adaptive characters in open-ended simulation gamesPublished by Association for Computing Machinery (ACM) ,2007
- Reinforcement Learning for RoboCup Soccer KeepawayAdaptive Behavior, 2005
- Acrobot control by learning the switching of multiple controllersArtificial Life and Robotics, 2005
- Exemplar-Based Direct Policy Search with Evolutionary OptimizationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Reinforcement Learning: An IntroductionIEEE Transactions on Neural Networks, 1998