This paper establishes central limit theorems for Polyak-Ruppert averaged Q-learning under asynchronous updates. We prove a non-asymptotic central limit theorem, where the convergence rate in Wasserstein distance explicitly reflects the dependence on the number of iterations, state-action space size, the discount factor, and the quality of exploration. In addition, we derive a functional central limit theorem, showing that the partial-sum process converges weakly to a Brownian motion.
翻译:本文建立了异步更新条件下Polyak-Ruppert平均Q学习的中心极限定理。我们证明了一个非渐近中心极限定理,其中Wasserstein距离的收敛速率明确反映了其对迭代次数、状态-动作空间大小、折扣因子和探索质量的依赖性。此外,我们推导了泛函中心极限定理,证明部分和过程弱收敛于布朗运动。