Exploitation-Oriented Learning PS-r^#

Abstract

Exploitation-oriented learning (XoL) is a novel approach to goal-directed learning from interaction. Reinforcement learning is much more focused on learning and ensures optimality in Markov decision process (MDP) environments, XoL involves learning a rational policy that obtains rewards continuously and very quickly. PS-r^*, a form of XoL, involves learning a useful rational policy not inferior to the random walk in the partially observed Markov decision process (POMDP) where reward types number one. PS-r^*, however, requires O(MN²) memory where N is the number of sensory input types and M is an action. We propose PS-r^#for learning a useful rational policy in the POMDP using O(MN) memory. PS-r^#effectiveness is confirmed in numerical examples.

Keywords

This publication has 10 references indexed in Scilit:

Reinforcement Learning for Penalty Avoidance in Continuous State Spaces
Journal of Advanced Computational Intelligence and Intelligent Informatics, 2007
Motivated reinforcement learning for adaptive characters in open-ended simulation games
Published by Association for Computing Machinery (ACM) ,2007
Exploration and apprenticeship learning in reinforcement learning
Published by Association for Computing Machinery (ACM) ,2005
An Extension of Profit Sharing to Partially Observable Markov Decision Processes: Proposition of PS-r* and its Evaluation.
Transactions of the Japanese Society for Artificial Intelligence, 2003
Reinforcement learning for penalty avoiding policy making
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Reinforcement Learning: An Introduction
IEEE Transactions on Neural Networks, 1998
Instance-Based Utile Distinctions for Reinforcement Learning with Hidden State
Published by Elsevier BV ,1995
Reinforcement Learning by Stochastic Hill Climbing on Discounted Reward
Published by Elsevier BV ,1995
Learning Without State-Estimation in Partially Observable Markovian Decision Processes
Published by Elsevier BV ,1994
Simple statistical gradient-following algorithms for connectionist reinforcement learning
Machine Learning, 1992

Cited by 23 articles

Exploitation-Oriented Learning PS-r#

Abstract

Keywords

Exploitation-Oriented Learning PS-r^#