Reinforcement Learning for Penalty Avoidance in Continuous State Spaces

Abstract

Reinforcement learning involves learning to adapt to environments through the presentation of rewards – special input – serving as clues. To obtain quick rational policies, profit sharing (PS) [6], rational policy making algorithm (RPM) [7], penalty avoiding rational policy making algorithm (PARP) [8], and PS-r* [9] are used. They are called PS-based methods. When applying reinforcement learning to actual problems, treatment of continuous-valued input is sometimes required. A method [10] based on RPM is proposed as a PS-based method corresponding to the continuous-valued input, but only rewards exist and penalties cannot be suitably handled. We studied the treatment of continuous-valued input suitable for a PS-based method in which the environment includes both rewards and penalties. Specifically, we propose having PARP correspond to continuous-valued input while simultaneously targeting the attainment of rewards and avoiding penalties. We applied our proposal to the pole-cart balancing problem and confirmed its validity.

Keywords

This publication has 10 references indexed in Scilit:

Reinforcement Learning in Multi-dimensional State-action Space Using Random Tiling and Gibbs Sampling
Transactions of the Society of Instrument and Control Engineers, 2006
A Reinforcement Learning Algorithm for Continuous State Spaces using Multiple Fuzzy-ART Networks
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2006
Exploration and apprenticeship learning in reinforcement learning
Published by Association for Computing Machinery (ACM) ,2005
An Extension of Profit Sharing to Partially Observable Markov Decision Processes: Proposition of PS-r* and its Evaluation.
Transactions of the Japanese Society for Artificial Intelligence, 2003
Theoretical analysis of the unimodal normal distribution crossover for real-coded genetic algorithms
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Reinforcement learning for penalty avoiding policy making
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Reinforcement Learning: An Introduction
IEEE Transactions on Neural Networks, 1998
Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces
Adaptive Behavior, 1997
Q-learning
Machine Learning, 1992
Learning to predict by the methods of temporal differences
Machine Learning, 1988

Cited by 9 articles