This paper derives non-asymptotic error bounds for nonlinear stochastic approximation algorithms in the Wasserstein-$p$ distance. To obtain explicit finite-sample guarantees for the last iterate, we develop a coupling argument that compares the discrete-time process to a limiting Ornstein-Uhlenbeck process. Our analysis applies to algorithms driven by general noise conditions, including martingale differences and functions of ergodic Markov chains. Complementing this result, we handle the convergence rate of the Polyak-Ruppert average through a direct analysis that applies under the same general setting. Assuming the driving noise satisfies a non-asymptotic central limit theorem, we show that the normalized last iterates converge to a Gaussian distribution in the $p$-Wasserstein distance at a rate of order $γ_n^{1/6}$, where $γ_n$ is the step size. Similarly, the Polyak-Ruppert average is shown to converge in the Wasserstein distance at a rate of order $n^{-1/6}$. These distributional guarantees imply high-probability concentration inequalities that improve upon those derived from moment bounds and Markov's inequality. We demonstrate the utility of this approach by considering two applications: (1) linear stochastic approximation, where we explicitly quantify the transition from heavy-tailed to Gaussian behavior of the iterates, thereby bridging the gap between recent finite-sample analyses and asymptotic theory and (2) stochastic gradient descent, where we establish rate of convergence to the central limit theorem.
翻译:本文推导了非线性随机逼近算法在Wasserstein-$p$距离下的非渐近误差界。为获得末次迭代的显式有限样本保证,我们构建了一种耦合论证方法,将离散时间过程与极限Ornstein-Uhlenbeck过程进行比较。该分析适用于由一般噪声条件驱动的算法,包括鞅差序列与遍历马尔可夫链的函数。作为补充,我们通过直接分析处理了Polyak-Ruppert平均的收敛速率,该分析在相同的一般设定下适用。假设驱动噪声满足非渐近中心极限定理,我们证明归一化末次迭代以$γ_n^{1/6}$阶速率在$p$-Wasserstein距离下收敛于高斯分布,其中$γ_n$为步长。类似地,Polyak-Ruppert平均被证明以$n^{-1/6}$阶速率在Wasserstein距离下收敛。这些分布保证推导出的高概率集中不等式,改进了基于矩界和马尔可夫不等式导出的结果。我们通过两个应用案例展示该方法的实用性:(1) 线性随机逼近中,我们显式量化了迭代从重尾行为到高斯行为的转变,从而弥合了近期有限样本分析与渐近理论之间的差距;(2) 随机梯度下降中,我们建立了向中心极限定理收敛的速率。