We present Wasserstein Adaptive Value Estimation for Actor-Critic (WAVE), an approach to enhance stability in deep reinforcement learning through adaptive Wasserstein regularization. Our method addresses the inherent instability of actor-critic algorithms by incorporating an adaptively weighted Wasserstein regularization term into the critic's loss function. We prove that WAVE achieves $\mathcal{O}\left(\frac{1}{k}\right)$ convergence rate for the critic's mean squared error and provide theoretical guarantees for stability through Wasserstein-based regularization. Using the Sinkhorn approximation for computational efficiency, our approach automatically adjusts the regularization based on the agent's performance. Theoretical analysis and experimental results demonstrate that WAVE achieves superior performance compared to standard actor-critic methods.
翻译:本文提出Wasserstein自适应价值估计的演员-评论家方法(WAVE),该方法通过自适应Wasserstein正则化增强深度强化学习的稳定性。我们的方法通过在评论家损失函数中引入自适应加权的Wasserstein正则化项,解决了演员-评论家算法固有的不稳定性问题。我们证明WAVE实现了评论家均方误差的$\mathcal{O}\left(\frac{1}{k}\right)$收敛速率,并通过基于Wasserstein距离的正则化提供了稳定性的理论保证。为提升计算效率,我们采用Sinkhorn近似方法,使算法能根据智能体性能自动调整正则化强度。理论分析与实验结果表明,与标准演员-评论家方法相比,WAVE取得了更优越的性能。