We analyze the behavior of stochastic approximation algorithms where iterates, in expectation, progress towards an objective at each step. When progress is proportional to the step size of the algorithm, we prove exponential concentration bounds. These tail-bounds contrast asymptotic normality results, which are more frequently associated with stochastic approximation. The methods that we develop rely on a geometric ergodicity proof. This extends a result on Markov chains due to Hajek (1982) to the area of stochastic approximation algorithms. We apply our results to several different Stochastic Approximation algorithms, specifically Projected Stochastic Gradient Descent, Kiefer-Wolfowitz and Stochastic Frank-Wolfe algorithms. When applicable, our results prove faster $O(1/t)$ and linear convergence rates for Projected Stochastic Gradient Descent with a non-vanishing gradient.
翻译:我们分析了随机逼近算法的行为,其中迭代在期望意义上逐步向目标推进。当推进速度与算法步长成比例时,我们证明了指数集中界。这些尾概率界与渐近正态性结果形成对比,后者更常与随机逼近相关联。我们发展的方法依赖于几何遍历性证明,这将哈耶克(Hajek,1982)关于马尔可夫链的结果推广到了随机逼近算法领域。我们将所得结果应用于多种不同的随机逼近算法,具体包括投影随机梯度下降、Kiefer-Wolfowitz算法和随机Frank-Wolfe算法。在适用条件下,我们的结果为非消失梯度下的投影随机梯度下降证明了更快的$O(1/t)$收敛速度和线性收敛速率。