We present Wasserstein Adaptive Value Estimation for Actor-Critic (WAVE), an approach to enhance stability in deep reinforcement learning through adaptive Wasserstein regularization. Our method addresses the inherent instability of actor-critic algorithms by incorporating an adaptively weighted Wasserstein regularization term into the critic's loss function. We prove that WAVE achieves $\mathcal{O}\left(\frac{1}{k}\right)$ convergence rate for the critic's mean squared error and provide theoretical guarantees for stability through Wasserstein-based regularization. Using the Sinkhorn approximation for computational efficiency, our approach automatically adjusts the regularization based on the agent's performance. Theoretical analysis and experimental results demonstrate that WAVE achieves superior performance compared to standard actor-critic methods.
翻译:我们提出了基于Wasserstein距离的自适应价值估计的Actor-Critic方法(WAVE),这是一种通过自适应Wasserstein正则化来增强深度强化学习稳定性的方法。我们的方法通过在评论家的损失函数中引入一个自适应加权的Wasserstein正则化项,解决了Actor-Critic算法固有的不稳定性问题。我们证明了WAVE能够实现评论家均方误差的$\mathcal{O}\left(\frac{1}{k}\right)$收敛速率,并通过基于Wasserstein距离的正则化提供了稳定性的理论保证。为了提高计算效率,我们采用了Sinkhorn近似,使我们的方法能够根据智能体的性能自动调整正则化强度。理论分析和实验结果表明,与标准的Actor-Critic方法相比,WAVE实现了更优越的性能。