A Computational Model of Learned Avoidance Behavior in a One-Way Avoidance Experiment

Abstract
We present a computational model of learned avoidance behavior in a one-way avoidance experiment. Our model employs the reinforcement learning paradigm and a temporal-difference algorithm to implement both classically conditioned and instrumentally conditioned components. The role of the classically conditioned component is to develop an expectation of future benefit that is a function of the learning system's state and action. Competition among the instrumentally conditioned components determines the overt behavior generated by the learning system. Our model displays, in simulation, the reduced latency of the avoidance behavior during learning with continuing trials and the resistance to extinction of the avoidance response. These results are consistent with experimentally observed animal behavior. Our model extends the traditional two-process learning mechanism of Mowrer by explicitly defining the mechanisms of proprioceptive feedback, an internal clock, and generalization over the action space.