This paper presents a framework for learning vision-based robotic policies for contact-rich manipulation tasks that generalize spatially across task configurations. We focus on achieving robust spatial generalization of the policy for the peg-in-hole (PiH) task trained from a small number of demonstrations. We propose EquiContact, a hierarchical policy composed of a high-level vision planner (Diffusion Equivariant Descriptor Field, Diff-EDF) and a novel low-level compliant visuomotor policy (Geometric Compliant ACT, G-CompACT). G-CompACT operates using only localized observations (geometrically consistent error vectors (GCEV), force-torque readings, and wrist-mounted RGB images) and produces actions defined in the end-effector frame. Through these design choices, we show that the entire EquiContact pipeline is SE(3)-equivariant, from perception to force control. We also outline three key components for spatially generalizable contact-rich policies: compliance, localized policies, and induced equivariance. Real-world experiments on PiH, screwing, and surface wiping tasks demonstrate a near-perfect success rate and robust generalization to unseen spatial configurations, validating the proposed framework and principles.
翻译:本文提出了一种学习基于视觉的机器人策略的框架,用于接触丰富的操作任务,该策略能够在不同任务配置间实现空间泛化。我们专注于实现针对"孔中插钉"任务策略的鲁棒空间泛化,该策略仅需少量演示即可训练。我们提出了EquiContact,一种由高层视觉规划器(扩散等变描述符场,Diff-EDF)和一种新颖的低层顺应性视觉运动策略(几何顺应性ACT,G-CompACT)组成的分层策略。G-CompACT仅使用局部化观测(几何一致误差向量、力-力矩读数以及腕部安装的RGB图像)进行操作,并产生定义在末端执行器坐标系中的动作。通过这些设计选择,我们证明了整个EquiContact流程——从感知到力控制——是SE(3)等变的。我们还概述了实现空间可泛化接触丰富策略的三个关键组成部分:顺应性、局部化策略以及诱导等变性。在孔中插钉、拧螺丝和表面擦拭任务上的真实世界实验展示了接近完美的成功率以及对未见空间配置的鲁棒泛化能力,验证了所提出的框架和原理。