In this paper, we establish the global convergence of the actor-critic algorithm with a significantly improved sample complexity of $O(\epsilon^{-3})$, advancing beyond the existing local convergence results. Previous works provide local convergence guarantees with a sample complexity of $O(\epsilon^{-2})$ for bounding the squared gradient of the return, which translates to a global sample complexity of $O(\epsilon^{-4})$ using the gradient domination lemma. In contrast to traditional methods that employ decreasing step sizes for both the actor and critic, we demonstrate that a constant step size for the critic is sufficient to ensure convergence in expectation. This key insight reveals that using a decreasing step size for the actor alone is sufficient to handle the noise for both the actor and critic. Our findings provide theoretical support for the practical success of many algorithms that rely on constant step sizes.
翻译:本文中,我们证明了Actor-Critic算法在显著改进的$O(\epsilon^{-3})$样本复杂度下具有全局收敛性,超越了现有的局部收敛结果。先前的研究为回报平方梯度的边界提供了$O(\epsilon^{-2})$样本复杂度的局部收敛保证,通过梯度支配引理可转化为$O(\epsilon^{-4})$的全局样本复杂度。与传统方法中Actor和Critic均采用递减步长不同,我们证明了Critic使用恒定步长足以确保期望收敛。这一关键发现表明,仅对Actor使用递减步长就足以处理Actor和Critic两者的噪声。我们的研究结果为依赖恒定步长的众多算法的实际成功提供了理论支持。