The ability of a robot to pick an object, known as robot grasping, is crucial for several applications, such as assembly or sorting. In such tasks, selecting the right target to pick is as essential as inferring a correct configuration of the gripper. A common solution to this problem relies on semantic segmentation models, which often show poor generalization to unseen objects and require considerable time and massive data to be trained. To reduce the need for large datasets, some grasping pipelines exploit few-shot semantic segmentation models, which are capable of recognizing new classes given a few examples. However, this often comes at the cost of limited performance and fine-tuning is required to be effective in robot grasping scenarios. In this work, we propose to overcome all these limitations by combining the impressive generalization capability reached by foundation models with a high-performing few-shot classifier, working as a score function to select the segmentation that is closer to the support set. The proposed model is designed to be embedded in a grasp synthesis pipeline. The extensive experiments using one or five examples show that our novel approach overcomes existing performance limitations, improving the state of the art both in few-shot semantic segmentation on the Graspnet-1B (+10.5% mIoU) and Ocid-grasp (+1.6% AP) datasets, and real-world few-shot grasp synthesis (+21.7% grasp accuracy). The project page is available at: https://leobarcellona.github.io/showandgrasp.github.io/
翻译:机器人抓取(即机器人拾取物体的能力)对装配、分拣等应用至关重要。在此类任务中,选择正确的抓取目标与推断夹具的正确构型同等重要。针对该问题的常见解决方案依赖于语义分割模型,但此类模型通常对未见物体泛化能力差,且需要大量时间和海量数据才能完成训练。为减少对大型数据集的需求,部分抓取流程采用少样本语义分割模型——这类模型仅需少量示例即可识别新类别。然而,这往往以性能受限为代价,且需通过微调才能在机器人抓取场景中有效运行。本研究提出结合基础模型卓越的泛化能力与高性能少样本分类器(作为评分函数选择更接近支持集的分割结果)来克服上述局限。所提模型专为嵌入抓取合成流程而设计。使用单样本或五幅示例的广泛实验表明,本创新方法突破了现有性能瓶颈:在Graspnet-1B数据集上少样本语义分割平均交并比提升10.5%,在Ocid-grasp数据集上平均精度提升1.6%,实际场景少样本抓取合成准确率提升21.7%。项目页面见:https://leobarcellona.github.io/showandgrasp.github.io/