We consider linear two-time-scale stochastic approximation algorithms driven by martingale noise. Recent applications in machine learning motivate the need to understand finite-time error rates, but conventional stochastic approximation analysis focus on either asymptotic convergence in distribution or finite-time bounds that are far from optimal. Prior work on asymptotic central limit theorems (CLTs) suggest that two-time-scale algorithms may be able to achieve $1/\sqrt{n}$ error in expectation, with a constant given by the expected norm of the limiting Gaussian vector. However, the best known finite-time rates are much slower. We derive the first nonasymptotic central limit theorem with respect to the Wasserstein-1 distance for two-time-scale stochastic approximation with Polyak-Ruppert averaging. As a corollary, we show that expected error achieved by Polyak-Ruppert averaging decays at rate $1/\sqrt{n}$, which significantly improves on the rates of convergence in prior works.
翻译:我们研究了由鞅噪声驱动的线性双时间尺度随机逼近算法。机器学习中的最新应用促使人们需要理解有限时间误差率,但传统的随机逼近分析要么关注分布上的渐近收敛,要么关注远非最优的有限时间界。先前关于渐近中心极限定理(CLT)的研究表明,双时间尺度算法可能能够在期望意义上实现 $1/\sqrt{n}$ 的误差,其常数由极限高斯向量的期望范数给出。然而,目前已知的最佳有限时间速率要慢得多。我们针对采用Polyak-Ruppert平均的双时间尺度随机逼近,首次推导了关于Wasserstein-1距离的非渐近中心极限定理。作为推论,我们证明了Polyak-Ruppert平均所实现的期望误差以 $1/\sqrt{n}$ 的速率衰减,这显著改进了先前工作中的收敛速率。