This paper proposes to develop a new variant of the two-time-scale stochastic approximation to find the roots of two coupled nonlinear operators, assuming only noisy samples of these operators can be observed. Our key idea is to leverage the classic Ruppert-Polyak averaging technique to dynamically estimate the operators through their samples. The estimated values of these averaging steps will then be used in the two-time-scale stochastic approximation updates to find the desired solution. Our main theoretical result is to show that under the strongly monotone condition of the underlying nonlinear operators the mean-squared errors of the iterates generated by the proposed method converge to zero at an optimal rate $O(1/k)$, where $k$ is the number of iterations. Our result significantly improves the existing result of two-time-scale stochastic approximation, where the best known finite-time convergence rate is $O(1/k^{2/3})$. We illustrate this result by applying the proposed method to develop new reinforcement learning algorithms with improved performance.
翻译:本文提出一种新的双时间尺度随机逼近变体方法,用于求解两个耦合非线性算子的根,且仅能观测到这些算子的含噪声样本。核心思想是利用经典的Ruppert-Polyak平均技术,通过样本动态估计算子。这些平均步骤的估计值将用于双时间尺度随机逼近更新中,以寻找所需解。主要理论结果表明:在底層非线性算子满足强单调性的条件下,所提方法生成的迭代序列的均方误差以最优收敛率$O(1/k)$趋近于零,其中$k$为迭代次数。该结果显著改进了现有双时间尺度随机逼近研究——目前已知的最优有限时间收敛率为$O(1/k^{2/3})$。我们通过将该方法应用于开发性能更优的强化学习算法,验证了这一结果的有效性。