EquiBim: Learning Symmetry-Equivariant Policy for Bimanual Manipulation

Robotic imitation learning has achieved impressive success in learning complex manipulation behaviors from demonstrations. However, many existing robot learning methods do not explicitly account for the physical symmetries of robotic systems, often resulting in asymmetric or inconsistent behaviors under symmetric observations. This limitation is particularly pronounced in dual-arm manipulation, where bilateral symmetry is inherent to both the robot morphology and the structure of many tasks. In this paper, we introduce EquiBim, a symmetry-equivariant policy learning framework for bimanual manipulation that enforces bilateral equivariance between observations and actions during training. Our approach formulates physical symmetry as a group action on both observation and action spaces, and imposes an equivariance constraint on policy predictions under symmetric transformations. The framework is model-agnostic and can be seamlessly integrated into a wide range of imitation learning pipelines with diverse observation modalities and action representations, including point cloud-based and image-based policies, as well as both end-effector-space and joint-space parameterizations. We evaluate EquiBim on RoboTwin, a dual-arm robotic platform with symmetric kinematics, and evaluate it across diverse observation and action configurations in simulation. We further validate the approach on a real-world dual-arm system. Across both simulation and physical experiments, our method consistently improves performance and robustness under distribution shifts. These results suggest that explicitly enforcing physical symmetry provides a simple yet effective inductive bias for bimanual robot learning.

翻译：机器人模仿学习在从演示中学习复杂操作行为方面已取得显著成功。然而，许多现有机器人学习方法未明确考虑机器人系统的物理对称性，这常常导致在对称观测下产生非对称或不一致的行为。这一局限在双臂操作中尤为突出，因为双臂对称性既内在于机器人形态学，也内在于许多任务的结构。本文提出EquiBim，一个用于双手操作的对称等变策略学习框架，该框架在训练期间强制观测与动作之间的双边等变性。我们的方法将物理对称性表述为观测空间和动作空间上的群作用，并对对称变换下的策略预测施加等变约束。该框架与模型无关，可无缝集成到具有多样化观测模态和动作表示的广泛模仿学习流程中，包括基于点云和基于图像的策略，以及末端执行器空间和关节空间参数化。我们在RoboTwin（一个具有对称运动学的双臂机器人平台）上评估EquiBim，并在仿真中针对多种观测与动作配置进行评估。我们进一步在真实世界双臂系统上验证了该方法。在仿真和物理实验中，我们的方法在分布偏移下持续提升了性能与鲁棒性。这些结果表明，显式强制物理对称性为双手机器人学习提供了一种简单而有效的归纳偏置。