In reinforcement learning (RL), exploiting environmental symmetries can significantly enhance efficiency, robustness, and performance. However, ensuring that the deep RL policy and value networks are respectively equivariant and invariant to exploit these symmetries is a substantial challenge. Related works try to design networks that are equivariant and invariant by construction, limiting them to a very restricted library of components, which in turn hampers the expressiveness of the networks. This paper proposes a method to construct equivariant policies and invariant value functions without specialized neural network components, which we term equivariant ensembles. We further add a regularization term for adding inductive bias during training. In a map-based path planning case study, we show how equivariant ensembles and regularization benefit sample efficiency and performance.
翻译:在强化学习(RL)中,利用环境对称性可显著提升效率、鲁棒性与性能。然而,确保深度强化学习的策略网络与价值网络分别具备等变性与不变性以利用这些对称性,是一项重大挑战。相关研究尝试通过结构设计实现网络的等变性与不变性,但这将其限制在极为有限的组件库中,进而制约了网络的表达能力。本文提出一种无需专用神经网络组件即可构建等变策略与不变价值函数的方法,称为等变集成。我们进一步引入正则化项以在训练过程中添加归纳偏置。通过基于地图的路径规划案例研究,我们展示了等变集成与正则化如何提升样本效率与性能。