Tuning continual exploration in reinforcement learning: An optimality property of the Boltzmann strategy

Publisher Website

31 August 2008

journal article
Published by Elsevier BV in Neurocomputing

Vol. 71 (13-15), 2507-2520
https://doi.org/10.1016/j.neucom.2007.11.040

Abstract

No abstract available

Keywords

This publication has 11 references indexed in Scilit:

Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation
IEEE Transactions on Knowledge and Data Engineering, 2007
The Fastest Mixing Markov Process on a Graph and a Connection to a Maximum Variance Unfolding Problem
SIAM Review, 2006
A New Q-Learning Algorithm Based on the Metropolis Criterion
IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2004
Fastest Mixing Markov Chain on a Graph
SIAM Review, 2004
On the Existence of Fixed Points for Approximate Value Iteration and Temporal-Difference Learning
Journal of Optimization Theory and Applications, 2000
Exploration of Multi-State Environments: Local Measures and Back-Propagation of Uncertainty
Machine Learning, 1999
Cyclic flows, Markov process and stochastic traffic assignment
Transportation Research Part B: Methodological, 1996
Reinforcement Learning: A Survey
Journal of Artificial Intelligence Research, 1996
Reinforcement learning with replacing eligibility traces
Machine Learning, 1996
A probabilistic multipath traffic assignment model which obviates path enumeration
Transportation Research, 1971

Cited by 17 articles