One-dimensional system arising in stochastic gradient descent
- 1 June 2021
- journal article
- research article
- Published by Cambridge University Press (CUP) in Advances in Applied Probability
- Vol. 53 (2), 575-607
- https://doi.org/10.1017/apr.2020.10
Abstract
We consider stochastic differential equations of the form $dX_t = |f(X_t)|/t^{\gamma} dt+1/t^{\gamma} dB_t$ , where f(x) behaves comparably to $|x|^k$ in a neighborhood of the origin, for $k\in [1,\infty)$ . We show that there exists a threshold value $ \,{:}\,{\raise-1.5pt{=}}\, \tilde{\gamma}$ for $\gamma$ , depending on k, such that if $\gamma \in (1/2, \tilde{\gamma})$ , then $\mathbb{P}(X_t\rightarrow 0) = 0$ , and for the rest of the permissible values of $\gamma$ , $\mathbb{P}(X_t\rightarrow 0)>0$ . These results extend to discrete processes that satisfy $X_{n+1}-X_n = f(X_n)/n^\gamma +Y_n/n^\gamma$ . Here, $Y_{n+1}$ are martingale differences that are almost surely bounded. This result shows that for a function F whose second derivative at degenerate saddle points is of polynomial order, it is always possible to escape saddle points via the iteration $X_{n+1}-X_n =F'(X_n)/n^\gamma +Y_n/n^\gamma$ for a suitable choice of $\gamma$ .
Keywords
This publication has 16 references indexed in Scilit:
- A survey of random processes with reinforcementProbability Surveys, 2007
- Learning Rates for Q-LearningLecture Notes in Computer Science, 2001
- Stochastic optimization applied to a manufacturing system operation problemPublished by Association for Computing Machinery (ACM) ,1995
- Acceleration of Stochastic Approximation by AveragingSIAM Journal on Control and Optimization, 1992
- On the Law of the Iterated Logarithm for MartingalesThe Annals of Probability, 1992
- Recursive Stochastic Algorithms for Global Optimization in $\mathbb{R}^d $SIAM Journal on Control and Optimization, 1991
- When are Touchpoints Limits for Generalized Polya URNS?Proceedings of the American Mathematical Society, 1991
- Nonconvergence to Unstable Points in Urn Models and Stochastic ApproximationsThe Annals of Probability, 1990
- A Strong Law for Some Generalized Urn ProcessesThe Annals of Probability, 1980
- A Stochastic Approximation MethodThe Annals of Mathematical Statistics, 1951