Recent work in reinforcement learning has leveraged symmetries in the model to improve sample efficiency in training a policy. A commonly used simplifying assumption is that the dynamics and reward both exhibit the same symmetry. However, in many real-world environments, the dynamical model exhibits symmetry independent of the reward model: the reward may not satisfy the same symmetries as the dynamics. In this paper, we investigate scenarios where only the dynamics are assumed to exhibit symmetry, extending the scope of problems in reinforcement learning and learning in control theory where symmetry techniques can be applied. We use Cartan's moving frame method to introduce a technique for learning dynamics which, by construction, exhibit specified symmetries. We demonstrate through numerical experiments that the proposed method learns a more accurate dynamical model.
翻译:近期强化学习研究利用模型中的对称性来提高策略训练中的样本效率。一个常用的简化假设是动力学与回报函数均具有相同的对称性。然而在许多实际环境中,动力学模型呈现的对称性与回报模型相互独立:回报可能不满足与动力学相同的对称性。本文研究仅假设动力学存在对称性的场景,从而拓展了强化学习与控制理论学习中可应用对称性技术的问题范畴。我们采用Cartan活动标架法引入一种通过构造方式使动力学呈现指定对称性的学习技术。数值实验表明,所提方法能学习到更精确的动力学模型。