Sample efficiency is critical when applying learning-based methods to robotic manipulation due to the high cost of collecting expert demonstrations and the challenges of on-robot policy learning through online Reinforcement Learning (RL). Offline RL addresses this issue by enabling policy learning from an offline dataset collected using any behavioral policy, regardless of its quality. However, recent advancements in offline RL have predominantly focused on learning from large datasets. Given that many robotic manipulation tasks can be formulated as rotation-symmetric problems, we investigate the use of $SO(2)$-equivariant neural networks for offline RL with a limited number of demonstrations. Our experimental results show that equivariant versions of Conservative Q-Learning (CQL) and Implicit Q-Learning (IQL) outperform their non-equivariant counterparts. We provide empirical evidence demonstrating how equivariance improves offline learning algorithms in the low-data regime.
翻译:由于收集专家演示的高昂成本以及通过在线强化学习(RL)在机器人上进行策略学习的挑战,样本效率在将基于学习的方法应用于机器人操作时至关重要。离线RL通过允许从使用任意行为策略收集的离线数据集中学习策略来解决这一问题,而无论该行为策略的质量如何。然而,离线RL的最新进展主要集中在从大型数据集中学习。鉴于许多机器人操作任务可以被表述为旋转对称问题,我们研究了使用$SO(2)$-等变神经网络在演示数量有限的情况下进行离线RL。我们的实验结果表明,保守Q学习(CQL)和隐式Q学习(IQL)的等变版本优于其非等变版本。我们提供了经验证据,证明了等变性如何在低数据状态下改进离线学习算法。