Reinforcement Learning for Penalty Avoidance in Continuous State Spaces
- 20 July 2007
- journal article
- Published by Fuji Technology Press Ltd. in Journal of Advanced Computational Intelligence and Intelligent Informatics
- Vol. 11 (6), 668-676
- https://doi.org/10.20965/jaciii.2007.p0668
Abstract
Reinforcement learning involves learning to adapt to environments through the presentation of rewards – special input – serving as clues. To obtain quick rational policies, profit sharing (PS) [6], rational policy making algorithm (RPM) [7], penalty avoiding rational policy making algorithm (PARP) [8], and PS-r* [9] are used. They are called PS-based methods. When applying reinforcement learning to actual problems, treatment of continuous-valued input is sometimes required. A method [10] based on RPM is proposed as a PS-based method corresponding to the continuous-valued input, but only rewards exist and penalties cannot be suitably handled. We studied the treatment of continuous-valued input suitable for a PS-based method in which the environment includes both rewards and penalties. Specifically, we propose having PARP correspond to continuous-valued input while simultaneously targeting the attainment of rewards and avoiding penalties. We applied our proposal to the pole-cart balancing problem and confirmed its validity.Keywords
This publication has 10 references indexed in Scilit:
- Reinforcement Learning in Multi-dimensional State-action Space Using Random Tiling and Gibbs SamplingTransactions of the Society of Instrument and Control Engineers, 2006
- A Reinforcement Learning Algorithm for Continuous State Spaces using Multiple Fuzzy-ART NetworksPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2006
- Exploration and apprenticeship learning in reinforcement learningPublished by Association for Computing Machinery (ACM) ,2005
- An Extension of Profit Sharing to Partially Observable Markov Decision Processes: Proposition of PS-r* and its Evaluation.Transactions of the Japanese Society for Artificial Intelligence, 2003
- Theoretical analysis of the unimodal normal distribution crossover for real-coded genetic algorithmsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Reinforcement learning for penalty avoiding policy makingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Reinforcement Learning: An IntroductionIEEE Transactions on Neural Networks, 1998
- Experiments with Reinforcement Learning in Problems with Continuous State and Action SpacesAdaptive Behavior, 1997
- Q-learningMachine Learning, 1992
- Learning to predict by the methods of temporal differencesMachine Learning, 1988