Exploitation-Oriented Learning PS-r#

Abstract
Exploitation-oriented learning (XoL) is a novel approach to goal-directed learning from interaction. Reinforcement learning is much more focused on learning and ensures optimality in Markov decision process (MDP) environments, XoL involves learning a rational policy that obtains rewards continuously and very quickly. PS-r*, a form of XoL, involves learning a useful rational policy not inferior to the random walk in the partially observed Markov decision process (POMDP) where reward types number one. PS-r*, however, requires O(MN2) memory where N is the number of sensory input types and M is an action. We propose PS-r#for learning a useful rational policy in the POMDP using O(MN) memory. PS-r#effectiveness is confirmed in numerical examples.

This publication has 10 references indexed in Scilit: