Optimal habits can develop spontaneously through sensitivity to local cost

25 October 2010

journal article
Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences of the United States of America

Vol. 107 (47), 20512-20517
https://doi.org/10.1073/pnas.1013470107

Abstract

Habits and rituals are expressed universally across animal species. These behaviors are advantageous in allowing sequential behaviors to be performed without cognitive overload, and appear to rely on neural circuits that are relatively benign but vulnerable to takeover by extreme contexts, neuropsychiatric sequelae, and processes leading to addiction. Reinforcement learning (RL) is thought to underlie the formation of optimal habits. However, this theoretic formulation has principally been tested experimentally in simple stimulus-response tasks with relatively few available responses. We asked whether RL could also account for the emergence of habitual action sequences in realistically complex situations in which no repetitive stimulus-response links were present and in which many response options were present. We exposed naïve macaque monkeys to such experimental conditions by introducing a unique free saccade scan task. Despite the highly uncertain conditions and no instruction, the monkeys developed a succession of stereotypical, self-chosen saccade sequence patterns. Remarkably, these continued to morph for months, long after session-averaged reward and cost (eye movement distance) reached asymptote. Prima facie, these continued behavioral changes appeared to challenge RL. However, trial-by-trial analysis showed that pattern changes on adjacent trials were predicted by lowered cost, and RL simulations that reduced the cost reproduced the monkeys' behavior. Ultimately, the patterns settled into stereotypical saccade sequences that minimized the cost of obtaining the reward on average. These findings suggest that brain mechanisms underlying the emergence of habits, and perhaps unwanted repetitive behaviors in clinical disorders, could follow RL algorithms capturing extremely local explore/exploit tradeoffs.

Keywords

This publication has 16 references indexed in Scilit:

Limits of Predictability in Human Mobility
Science, 2010
How do wild baboons (Papio ursinus) plan their routes? Travel among multiple high-quality food sources with inter-group competition
Animal Cognition, 2009
Habits, Rituals, and the Evaluative Brain
Annual Review of Neuroscience, 2008
Midbrain dopamine neurons encode decisions for future action
Nature Neuroscience, 2006
Cortical substrates for exploratory decisions in humans
Nature, 2006
Representation of Action-Specific Reward Values in the Striatum
Science, 2005
Prefrontal cortex and decision making in a mixed-strategy game
Nature Neuroscience, 2004
A Neural Substrate of Prediction and Reward
Science, 1997
Bee foraging in uncertain environments using predictive hebbian learning
Nature, 1995
Actions and habits: the development of behavioural autonomy
Philosophical Transactions of the Royal Society of London. B, Biological Sciences, 1985

Cited by 31 articles