Entropy regularized Markov decision processes have been widely used in reinforcement learning. This paper is concerned with the primal-dual formulation of the entropy regularized problems. Standard first-order methods suffer from slow convergence due to the lack of strict convexity and concavity. To address this issue, we first introduce a new quadratically convexified primal-dual formulation. The natural gradient ascent descent of the new formulation enjoys global convergence guarantee and exponential convergence rate. We also propose a new interpolating metric that further accelerates the convergence significantly. Numerical results are provided to demonstrate the performance of the proposed methods under multiple settings.
翻译:熵正则化马尔可夫决策过程已广泛应用于强化学习领域。本文关注熵正则化问题的原始-对偶形式。由于缺乏严格凸性和凹性,标准一阶方法收敛速度较慢。为解决该问题,我们首先引入一种新的二次凸化原始-对偶形式。该新形式的自然梯度上升下降方法具有全局收敛保障与指数收敛速度。我们还提出一种新的插值度量,可进一步显著加速收敛过程。数值结果展示了所提方法在多种设置下的性能表现。