Stochasticity, Nonlinear Value Functions, and Update Rules in Learning Aesthetic Biases
Open Access
- 10 May 2021
- journal article
- research article
- Published by Frontiers Media SA in Frontiers in Human Neuroscience
Abstract
A theoretical framework for the reinforcement learning of aesthetic biases was recently proposed based on brain circuitries revealed by neuroimaging. A model grounded on that framework accounted for interesting features of human aesthetic biases. These features included individuality, cultural predispositions, stochastic dynamics of learning and aesthetic biases, and the peak-shift effect. However, despite the success in explaining these features, a potential weakness was the linearity of the value function used to predict reward. This linearity meant that the learning process employed a value function that assumed a linear relationship between reward and sensory stimuli. Linearity is common in reinforcement learning in neuroscience. However, linearity can be problematic because neural mechanisms and the dependence of reward on sensory stimuli were typically nonlinear. Here, we analyze the learning performance with models including optimal nonlinear value functions. We also compare updating the free parameters of the value functions with the delta rule, which neuroscience models use frequently, vs. updating with a new Phi rule that considers the structure of the nonlinearities. Our computer simulations showed that optimal nonlinear value functions resulted in improvements of learning errors when the reward models were nonlinear. Similarly, the new Phi rule led to improvements in these errors. These improvements were accompanied by the straightening of the trajectories of the vector of free parameters in its phase space. This straightening meant that the process became more efficient in learning the prediction of reward. Surprisingly, however, this improved efficiency had a complex relationship with the rate of learning. Finally, the stochasticity arising from the probabilistic sampling of sensory stimuli, rewards, and motivations helped the learning process narrow the range of free parameters to nearly optimal outcomes. Therefore, we suggest that value functions and update rules optimized for social and ecological constraints are ideal for learning aesthetic biases.Keywords
This publication has 96 references indexed in Scilit:
- Reinforcement learning in robotics: A surveyThe International Journal of Robotics Research, 2013
- Probabilistic brains: knowns and unknownsNature Neuroscience, 2013
- Encoding and decoding in fMRINeuroImage, 2011
- Art for reward's sake: Visual art recruits the ventral striatumNeuroImage, 2011
- Putting Reward in Art: A Tentative Prediction Error Account of Visual ArtI-Perception, 2011
- Nonlinear Dynamics of Emotion-Cognition Interaction: When Emotion Does not Destroy Cognition?Bulletin of Mathematical Biology, 2010
- Culture Wires the BrainPerspectives on Psychological Science, 2010
- Non-Sibsonian interpolation on arbitrary system of points in Euclidean space and adaptive isolines generationApplied Numerical Mathematics, 2000
- A Neural Substrate of Prediction and RewardScience, 1997
- Hippocampal conjunctive encoding, storage, and recall: Avoiding a trade‐offHippocampus, 1994