We survey Lyapunov-based techniques for the finite-time analysis of stochastic iterative algorithms, also known as stochastic approximation (SA) algorithms, for solving fixed-point equations $\bar{F}(x)=x$, where the operator $\bar{F}(\cdot)$ can only be accessed through a noisy oracle. We first focus on the standard setting in which $\bar{F}(\cdot)$ is contractive with respect to some norm and the noise is i.i.d., and explain how generalized Moreau envelopes serve as universal Lyapunov functions, regardless of the underlying norm. We then show how this framework yields mean-square convergence guarantees and applies to stochastic gradient descent, linear SA, and value-based reinforcement learning algorithms such as Q-learning and temporal-difference learning. Finally, we discuss extensions to Markovian noise, seminorm-contractive operators, dissipative operators, and high-probability bounds, and conclude with open problems. The goal is to present a unified and self-contained roadmap for the finite-time analysis of SA and its applications, especially in reinforcement learning.
翻译:本文综述了基于李雅普诺夫方法的随机迭代算法(即随机逼近算法)在有限时间分析中的应用。该类算法用于求解不动点方程 $\bar{F}(x)=x$,其中算子 $\bar{F}(\cdot)$ 只能通过含噪的随机黑箱进行访问。我们首先聚焦标准设定:算子 $\bar{F}(\cdot)$ 在某种范数下具有压缩性且噪声独立同分布,并阐释广义Moreau包络如何作为通用李雅普诺夫函数(与底层范数无关)。随后展示该框架如何导出均方收敛保证,并应用于随机梯度下降、线性随机逼近以及基于值函数的强化学习算法(如Q学习和时序差分学习)。最后讨论向马尔可夫噪声、半范数压缩算子、耗散算子和高概率界的扩展,并以开放性问题收尾。本文旨在为随机逼近的有限时间分析及其应用(特别是在强化学习领域)提供一个统一且自包含的研究路线图。