Steady-State Behavior of Constant-Stepsize Stochastic Approximation: Gaussian Approximation and Tail Bounds

Constant-stepsize stochastic approximation (SA) is widely used in learning for computational efficiency. For a fixed stepsize, the iterates typically admit a stationary distribution that is rarely tractable. Prior work shows that as the stepsize $α\downarrow 0$, the centered-and-scaled steady state converges weakly to a Gaussian random vector. However, for fixed $α$, this weak convergence offers no usable error bound for approximating the steady-state by its Gaussian limit. This paper provides explicit, non-asymptotic error bounds for fixed $α$. We first prove general-purpose theorems that bound the Wasserstein distance between the centered-scaled steady state and an appropriate Gaussian distribution, under regularity conditions for drift and moment conditions for noise. To ensure broad applicability, we cover both i.i.d. and Markovian noise models. We then instantiate these theorems for three representative SA settings: (1) stochastic gradient descent (SGD) for smooth strongly convex objectives, (2) linear SA, and (3) contractive nonlinear SA. We obtain dimension- and stepsize-dependent, explicit bounds in Wasserstein distance of order $α^{1/2}\log(1/α)$ for small $α$. Building on the Wasserstein approximation error, we further derive non-uniform Berry--Esseen-type tail bounds that compare the steady-state tail probability to Gaussian tails. We achieve an explicit error term that decays in both the deviation level and stepsize $α$. We adapt the same analysis for SGD beyond strongly convexity and study general convex objectives. We identify a non-Gaussian (Gibbs) limiting law under the correct scaling, which is validated numerically, and provide a corresponding pre-limit Wasserstein error bound.

翻译：恒定步长随机逼近（SA）因其计算效率在机器学习中被广泛使用。对于固定步长，迭代序列通常存在一个难以精确求解的平稳分布。已有研究表明，当步长 $α\downarrow 0$ 时，经中心化与缩放处理的稳态会弱收敛于一个高斯随机向量。然而，对于固定的 $α$，这种弱收敛无法为使用高斯极限逼近稳态提供可用的误差界。本文针对固定 $α$ 给出了显式的非渐近误差界。我们首先在漂移项的规律性条件与噪声的矩条件下，证明了通用定理，该定理界定了中心化缩放稳态与适当高斯分布之间的 Wasserstein 距离。为确保广泛的适用性，我们同时涵盖了独立同分布噪声与马尔可夫噪声模型。随后，我们将这些定理具体应用于三个代表性的 SA 设定：（1）用于光滑强凸目标的随机梯度下降（SGD），（2）线性 SA，以及（3）压缩非线性 SA。我们得到了 Wasserstein 距离上阶数为 $α^{1/2}\log(1/α)$（对于小 $α$）的、依赖于维度与步长的显式界。基于 Wasserstein 逼近误差，我们进一步推导了非均匀的 Berry--Esseen 型尾部界，用于比较稳态尾部概率与高斯尾部概率。我们得到了一个在偏离水平与步长 $α$ 上均衰减的显式误差项。我们将同样的分析推广到强凸性之外的 SGD，并研究了一般凸目标函数。我们识别了在正确缩放下的非高斯（吉布斯）极限律，并通过数值实验验证了该极限律，同时给出了相应的极限前 Wasserstein 误差界。