In reinforcement learning (RL), exploiting environmental symmetries can significantly enhance efficiency, robustness, and performance. However, ensuring that the deep RL policy and value networks are respectively equivariant and invariant to exploit these symmetries is a substantial challenge. Related works try to design networks that are equivariant and invariant by construction, limiting them to a very restricted library of components, which in turn hampers the expressiveness of the networks. This paper proposes a method to construct equivariant policies and invariant value functions without specialized neural network components, which we term equivariant ensembles. We further add a regularization term for adding inductive bias during training. In a map-based path planning case study, we show how equivariant ensembles and regularization benefit sample efficiency and performance.
翻译:在强化学习中,利用环境对称性能够显著提升效率、鲁棒性和性能。然而,确保深度强化学习策略网络和价值网络分别具有等变性和不变性以利用这些对称性是一项重大挑战。现有工作试图通过设计固有等变与不变性的网络,但这会将其限制在非常有限的组件库中,从而削弱网络的表达能力。本文提出一种无需专用神经网络组件即可构建等变策略与不变价值函数的方法,我们将其称为等变集成。此外,我们在训练过程中引入正则化项以增加归纳偏置。在基于地图的路径规划案例研究中,我们展示了等变集成与正则化如何提升样本效率与性能。