The convergence of deterministic policy gradient under the Hadamard parametrization is studied in the tabular setting and the global linear convergence of the algorithm is established. To this end, we first show that the error decreases at an $O(\frac{1}{k})$ rate for all the iterations. Based on this result, we further show that the algorithm has a faster local linear convergence rate after $k_0$ iterations, where $k_0$ is a constant that only depends on the MDP problem and the step size. Overall, the algorithm displays a linear convergence rate for all the iterations with a loose constant than that for the local linear convergence rate.
翻译:在表格设定的背景下研究了哈达玛参数化下确定性策略梯度的收敛性,并建立了该算法的全局线性收敛性。为此,我们首先证明所有迭代中误差以$O(\frac{1}{k})$的速率下降。基于这一结果,我们进一步证明该算法在经过$k_0$次迭代后具有更快的局部线性收敛速率,其中$k_0$是仅依赖于马尔可夫决策过程问题及步长的常数。总体而言,该算法在所有迭代中呈现线性收敛速率,其常数较局部线性收敛速率对应的常数更为宽松。