Tuning continual exploration in reinforcement learning: An optimality property of the Boltzmann strategy
- 31 August 2008
- journal article
- Published by Elsevier BV in Neurocomputing
- Vol. 71 (13-15), 2507-2520
- https://doi.org/10.1016/j.neucom.2007.11.040
Abstract
No abstract availableKeywords
This publication has 11 references indexed in Scilit:
- Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative RecommendationIEEE Transactions on Knowledge and Data Engineering, 2007
- The Fastest Mixing Markov Process on a Graph and a Connection to a Maximum Variance Unfolding ProblemSIAM Review, 2006
- A New Q-Learning Algorithm Based on the Metropolis CriterionIEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2004
- Fastest Mixing Markov Chain on a GraphSIAM Review, 2004
- On the Existence of Fixed Points for Approximate Value Iteration and Temporal-Difference LearningJournal of Optimization Theory and Applications, 2000
- Exploration of Multi-State Environments: Local Measures and Back-Propagation of UncertaintyMachine Learning, 1999
- Cyclic flows, Markov process and stochastic traffic assignmentTransportation Research Part B: Methodological, 1996
- Reinforcement Learning: A SurveyJournal of Artificial Intelligence Research, 1996
- Reinforcement learning with replacing eligibility tracesMachine Learning, 1996
- A probabilistic multipath traffic assignment model which obviates path enumerationTransportation Research, 1971