Bimanual manipulation is imperative yet challenging for robots to execute complex tasks, requiring coordinated collaboration between two arms. However, existing methods for bimanual manipulation often rely on costly data collection and training, struggling to generalize to unseen objects in novel categories efficiently. In this paper, we present Bi-Adapt, a novel framework designed for efficient generalization for bimanual manipulation via semantic correspondence. Bi-Adapt achieves cross-category affordance mapping by leveraging the strong capability of vision foundation models. Fine-tuning with restricted data on novel categories, Bi-Adapt exhibits notable generalization to out-of-category objects in a zero-shot manner. Extensive experiments conducted in both simulation and real-world environments validate the effectiveness of our approach and demonstrate its high efficiency, achieving a high success rate on different benchmark tasks across novel categories with limited data. Project website: https://biadapt-project.github.io/
翻译:双手操作对于机器人执行复杂任务至关重要,却也极具挑战性,它要求双臂之间进行协调协作。然而,现有的双手操作方法通常依赖于昂贵的数据收集和训练过程,难以高效地泛化到未见过的、属于新型类别的物体。本文提出了Bi-Adapt,一个旨在通过语义对应实现高效双手操作泛化的新型框架。Bi-Adapt通过利用视觉基础模型的强大能力,实现了跨类别的可供性映射。通过对新型类别进行有限数据的微调,Bi-Adapt以零样本方式展现出对类别外物体的显著泛化能力。在仿真和真实环境中进行的大量实验验证了我们方法的有效性,并证明了其高效率,能够在有限数据下,针对不同新型类别的基准任务实现高成功率。项目网站:https://biadapt-project.github.io/