Group symmetries provide a powerful inductive bias for reinforcement learning (RL), enabling efficient generalization across symmetric states and actions via group-invariant Markov Decision Processes (MDPs). However, real-world environments almost never realize fully group-invariant MDPs; dynamics, actuation limits, and reward design usually break symmetries, often only locally. Under group-invariant Bellman backups for such cases, local symmetry-breaking introduces errors that propagate across the entire state-action space, resulting in global value estimation errors. To address this, we introduce Partially group-Invariant MDP (PI-MDP), which selectively applies group-invariant or standard Bellman backups depending on where symmetry holds. This framework mitigates error propagation from locally broken symmetries while maintaining the benefits of equivariance, thereby enhancing sample efficiency and generalizability. Building on this framework, we present practical RL algorithms -- Partially Equivariant (PE)-DQN for discrete control and PE-SAC for continuous control -- that combine the benefits of equivariance with robustness to symmetry-breaking. Experiments across Grid-World, locomotion, and manipulation benchmarks demonstrate that PE-DQN and PE-SAC significantly outperform baselines, highlighting the importance of selective symmetry exploitation for robust and sample-efficient RL. Project page: https://pranaboy72.github.io/perl_page/
翻译:群对称性为强化学习(RL)提供了强大的归纳偏置,通过群不变马尔可夫决策过程(MDPs)能够高效地泛化至对称的状态与动作。然而,现实环境几乎从未实现完全的群不变MDPs;动力学、驱动限制和奖励设计通常会破坏对称性,且通常仅在局部范围内发生。在此类情况下,若采用群不变的贝尔曼更新,局部对称性破缺会引入误差,这些误差将在整个状态-动作空间中传播,导致全局价值估计错误。为解决这一问题,我们引入了部分群不变MDP(PI-MDP),该框架根据对称性成立的位置,选择性地应用群不变或标准的贝尔曼更新。这一方法在保持等变性优势的同时,减轻了局部对称性破缺导致的误差传播,从而提升了样本效率与泛化能力。基于此框架,我们提出了实用的强化学习算法——用于离散控制的局部等变(PE)-DQN和用于连续控制的PE-SAC——这些算法结合了等变性的优势与对对称性破缺的鲁棒性。在Grid-World、运动控制与操作任务基准测试中的实验表明,PE-DQN和PE-SAC显著优于基线方法,凸显了选择性利用对称性对于实现鲁棒且样本高效的强化学习的重要性。项目页面:https://pranaboy72.github.io/perl_page/