Can a robot manipulate intra-category unseen objects in arbitrary poses with the help of a mere demonstration of grasping pose on a single object instance? In this paper, we try to address this intriguing challenge by using USEEK, an unsupervised SE(3)-equivariant keypoints method that enjoys alignment across instances in a category, to perform generalizable manipulation. USEEK follows a teacher-student structure to decouple the unsupervised keypoint discovery and SE(3)-equivariant keypoint detection. With USEEK in hand, the robot can infer the category-level task-relevant object frames in an efficient and explainable manner, enabling manipulation of any intra-category objects from and to any poses. Through extensive experiments, we demonstrate that the keypoints produced by USEEK possess rich semantics, thus successfully transferring the functional knowledge from the demonstration object to the novel ones. Compared with other object representations for manipulation, USEEK is more adaptive in the face of large intra-category shape variance, more robust with limited demonstrations, and more efficient at inference time.
翻译:机器人能否仅通过单实例物体上抓取姿态的一次示范,即可对任意姿态的类别内未见物体进行操作?本文尝试通过USEEK——一种跨实例对齐的无监督SE(3)-等变关键点方法——解决这一挑战性课题,以实现可泛化操作。USEEK采用师生网络结构,将无监督关键点发现与SE(3)-等变关键点检测进行解耦。借助USEEK,机器人能够以高效且可解释的方式推断类别级任务相关物体坐标系,从而实现对任意类别内物体从任意姿态到任意姿态的操作。通过大量实验证明,USEEK生成的关键点具有丰富的语义信息,能够成功将示范物体的功能知识迁移至新物体。与其他用于操作任务的物体表示方法相比,USEEK在面对类别内大幅形状变化时具有更强的自适应性,在示范样本有限时更具鲁棒性,且推理阶段效率更高。