Exploitation-Oriented Learning PS-r#
- 20 November 2009
- journal article
- Published by Fuji Technology Press Ltd. in Journal of Advanced Computational Intelligence and Intelligent Informatics
- Vol. 13 (6), 624-630
- https://doi.org/10.20965/jaciii.2009.p0624
Abstract
Exploitation-oriented learning (XoL) is a novel approach to goal-directed learning from interaction. Reinforcement learning is much more focused on learning and ensures optimality in Markov decision process (MDP) environments, XoL involves learning a rational policy that obtains rewards continuously and very quickly. PS-r*, a form of XoL, involves learning a useful rational policy not inferior to the random walk in the partially observed Markov decision process (POMDP) where reward types number one. PS-r*, however, requires O(MN2) memory where N is the number of sensory input types and M is an action. We propose PS-r#for learning a useful rational policy in the POMDP using O(MN) memory. PS-r#effectiveness is confirmed in numerical examples.Keywords
This publication has 10 references indexed in Scilit:
- Reinforcement Learning for Penalty Avoidance in Continuous State SpacesJournal of Advanced Computational Intelligence and Intelligent Informatics, 2007
- Motivated reinforcement learning for adaptive characters in open-ended simulation gamesPublished by Association for Computing Machinery (ACM) ,2007
- Exploration and apprenticeship learning in reinforcement learningPublished by Association for Computing Machinery (ACM) ,2005
- An Extension of Profit Sharing to Partially Observable Markov Decision Processes: Proposition of PS-r* and its Evaluation.Transactions of the Japanese Society for Artificial Intelligence, 2003
- Reinforcement learning for penalty avoiding policy makingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Reinforcement Learning: An IntroductionIEEE Transactions on Neural Networks, 1998
- Instance-Based Utile Distinctions for Reinforcement Learning with Hidden StatePublished by Elsevier BV ,1995
- Reinforcement Learning by Stochastic Hill Climbing on Discounted RewardPublished by Elsevier BV ,1995
- Learning Without State-Estimation in Partially Observable Markovian Decision ProcessesPublished by Elsevier BV ,1994
- Simple statistical gradient-following algorithms for connectionist reinforcement learningMachine Learning, 1992