Reinforcement learning of motor skills in high dimensions: A path integral approach

1 May 2010

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 2397-2403
https://doi.org/10.1109/robot.2010.5509336

Abstract

Reinforcement learning (RL) is one of the most general approaches to learning control. Its applicability to complex motor systems, however, has been largely impossible so far due to the computational difficulties that reinforcement learning encounters in high dimensional continuous state-action spaces. In this paper, we derive a novel approach to RL for parameterized control policies based on the framework of stochastic optimal control with path integrals. While solidly grounded in optimal control theory and estimation theory, the update equations for learning are surprisingly simple and have no danger of numerical instabilities as neither matrix inversions nor gradient learning rates are required. Empirical evaluations demonstrate significant performance improvements over gradient-based policy learning and scalability to high-dimensional control problems. Finally, a learning experiment on a robot dog illustrates the functionality of our algorithm in a real-world scenario. We believe that our new algorithm, Policy Improvement with Path Integrals (PI²), offers currently one of the most efficient, numerically robust, and easy to implement algorithms for RL in robotics.

Keywords

This publication has 14 references indexed in Scilit:

Learning model-free robot control by a Monte Carlo EM algorithm
Autonomous Robots, 2009
Efficient computation of optimal actions
Proceedings of the National Academy of Sciences, 2009
Gaussian process dynamic programming
Neurocomputing, 2009
Rollout sampling approximate policy iteration
Machine Learning, 2008
Learning to Control in Operational Space
The International Journal of Robotics Research, 2008
Probabilistic inference for solving discrete and continuous state Markov Decision Processes
Published by Association for Computing Machinery (ACM) ,2006
Linear Theory for Control of Nonlinear Stochastic Systems
Physical Review Letters, 2005
Path integrals and symmetry breaking for optimal control theory
Journal of Statistical Mechanics: Theory and Experiment, 2005
Stochastic Differential Equations
Published by Springer Science and Business Media LLC ,2003
Using Expectation-Maximization for Reinforcement Learning
Neural Computation, 1997

Cited by 129 articles