We apply functional acceleration to the Policy Mirror Descent (PMD) general family of algorithms, which cover a wide range of novel and fundamental methods in Reinforcement Learning (RL). Leveraging duality, we propose a momentum-based PMD update. By taking the functional route, our approach is independent of the policy parametrization and applicable to large-scale optimization, covering previous applications of momentum at the level of policy parameters as a special case. We theoretically analyze several properties of this approach and complement with a numerical ablation study, which serves to illustrate the policy optimization dynamics on the value polytope, relative to different algorithmic design choices in this space. We further characterize numerically several features of the problem setting relevant for functional acceleration, and lastly, we investigate the impact of approximation on their learning mechanics.
翻译:我们将函数加速技术应用于策略镜像下降(PMD)这一通用算法族,该族涵盖了强化学习(RL)中一系列新颖且基础的方法。通过利用对偶性,我们提出了一种基于动量的PMD更新方式。采用函数路径的途径使我们的方法独立于策略参数化,适用于大规模优化问题,并将先前在策略参数层面应用动量的方法作为特例涵盖其中。我们从理论上分析了该方法的若干性质,并辅以数值消融研究,用以说明在价值多胞形上相对于该空间中不同算法设计选择的策略优化动态。我们进一步数值化地表征了与函数加速相关的问题设置的若干特征,最后,我们研究了近似化对其学习机制的影响。