Stochasticity, Nonlinear Value Functions, and Update Rules in Learning Aesthetic Biases

Open Access

10 May 2021

journal article
research article
Published by Frontiers Media SA in Frontiers in Human Neuroscience

Vol. 15
https://doi.org/10.3389/fnhum.2021.639081

Abstract

A theoretical framework for the reinforcement learning of aesthetic biases was recently proposed based on brain circuitries revealed by neuroimaging. A model grounded on that framework accounted for interesting features of human aesthetic biases. These features included individuality, cultural predispositions, stochastic dynamics of learning and aesthetic biases, and the peak-shift effect. However, despite the success in explaining these features, a potential weakness was the linearity of the value function used to predict reward. This linearity meant that the learning process employed a value function that assumed a linear relationship between reward and sensory stimuli. Linearity is common in reinforcement learning in neuroscience. However, linearity can be problematic because neural mechanisms and the dependence of reward on sensory stimuli were typically nonlinear. Here, we analyze the learning performance with models including optimal nonlinear value functions. We also compare updating the free parameters of the value functions with the delta rule, which neuroscience models use frequently, vs. updating with a new Phi rule that considers the structure of the nonlinearities. Our computer simulations showed that optimal nonlinear value functions resulted in improvements of learning errors when the reward models were nonlinear. Similarly, the new Phi rule led to improvements in these errors. These improvements were accompanied by the straightening of the trajectories of the vector of free parameters in its phase space. This straightening meant that the process became more efficient in learning the prediction of reward. Surprisingly, however, this improved efficiency had a complex relationship with the rate of learning. Finally, the stochasticity arising from the probabilistic sampling of sensory stimuli, rewards, and motivations helped the learning process narrow the range of free parameters to nearly optimal outcomes. Therefore, we suggest that value functions and update rules optimized for social and ecological constraints are ideal for learning aesthetic biases.

Keywords

This publication has 96 references indexed in Scilit:

Reinforcement learning in robotics: A survey
The International Journal of Robotics Research, 2013
Probabilistic brains: knowns and unknowns
Nature Neuroscience, 2013
Encoding and decoding in fMRI
NeuroImage, 2011
Art for reward's sake: Visual art recruits the ventral striatum
NeuroImage, 2011
Putting Reward in Art: A Tentative Prediction Error Account of Visual Art
I-Perception, 2011
Nonlinear Dynamics of Emotion-Cognition Interaction: When Emotion Does not Destroy Cognition?
Bulletin of Mathematical Biology, 2010
Culture Wires the Brain
Perspectives on Psychological Science, 2010
Non-Sibsonian interpolation on arbitrary system of points in Euclidean space and adaptive isolines generation
Applied Numerical Mathematics, 2000
A Neural Substrate of Prediction and Reward
Science, 1997
Hippocampal conjunctive encoding, storage, and recall: Avoiding a trade‐off
Hippocampus, 1994

Cited by 4 articles