Two-timescale stochastic approximation (TTSA) is among the most general frameworks for iterative stochastic algorithms. This includes well-known stochastic optimization methods such as SGD variants and those designed for bilevel or minimax problems, as well as reinforcement learning like the family of gradient-based temporal difference (GTD) algorithms. In this paper, we conduct an in-depth asymptotic analysis of TTSA under controlled Markovian noise via central limit theorem (CLT), uncovering the coupled dynamics of TTSA influenced by the underlying Markov chain, which has not been addressed by previous CLT results of TTSA only with Martingale difference noise. Building upon our CLT, we expand its application horizon of efficient sampling strategies from vanilla SGD to a wider TTSA context in distributed learning, thus broadening the scope of Hu et al. (2022). In addition, we leverage our CLT result to deduce the statistical properties of GTD algorithms with nonlinear function approximation using Markovian samples and show their identical asymptotic performance, a perspective not evident from current finite-time bounds.
翻译:两时间尺度随机逼近(TTSA)是迭代随机算法中最通用的框架之一,涵盖SGD变体等著名随机优化方法、针对双层或极小极大问题设计的算法,以及基于梯度的时序差分(GTD)算法族等强化学习方法。本文通过中心极限定理(CLT)对受控马尔可夫噪声下的TTSA进行深入渐近分析,揭示了底层马尔可夫链对TTSA耦合动力学的影响——这一现象在现有仅针对鞅差噪声的TTSA的CLT结果中尚未被探讨。基于所提出的CLT,我们将高效采样策略的应用范围从经典SGD拓展至分布式学习中更广泛的TTSA场景,从而扩展了Hu等(2022)的研究范畴。此外,我们利用CLT结果推导了采用马尔可夫样本的非线性函数逼近下GTD算法的统计性质,并证明了其渐近性能的一致性——这一视角在当前有限时间界中并不显著。