Show and Grasp: Few-shot Semantic Segmentation for Robot Grasping through Zero-shot Foundation Models

The ability of a robot to pick an object, known as robot grasping, is crucial for several applications, such as assembly or sorting. In such tasks, selecting the right target to pick is as essential as inferring a correct configuration of the gripper. A common solution to this problem relies on semantic segmentation models, which often show poor generalization to unseen objects and require considerable time and massive data to be trained. To reduce the need for large datasets, some grasping pipelines exploit few-shot semantic segmentation models, which are capable of recognizing new classes given a few examples. However, this often comes at the cost of limited performance and fine-tuning is required to be effective in robot grasping scenarios. In this work, we propose to overcome all these limitations by combining the impressive generalization capability reached by foundation models with a high-performing few-shot classifier, working as a score function to select the segmentation that is closer to the support set. The proposed model is designed to be embedded in a grasp synthesis pipeline. The extensive experiments using one or five examples show that our novel approach overcomes existing performance limitations, improving the state of the art both in few-shot semantic segmentation on the Graspnet-1B (+10.5% mIoU) and Ocid-grasp (+1.6% AP) datasets, and real-world few-shot grasp synthesis (+21.7% grasp accuracy). The project page is available at: https://leobarcellona.github.io/showandgrasp.github.io/

翻译：机器人抓取（即机器人拾取物体的能力）对装配、分拣等应用至关重要。在此类任务中，选择正确的抓取目标与推断夹具的正确构型同等重要。针对该问题的常见解决方案依赖于语义分割模型，但此类模型通常对未见物体泛化能力差，且需要大量时间和海量数据才能完成训练。为减少对大型数据集的需求，部分抓取流程采用少样本语义分割模型——这类模型仅需少量示例即可识别新类别。然而，这往往以性能受限为代价，且需通过微调才能在机器人抓取场景中有效运行。本研究提出结合基础模型卓越的泛化能力与高性能少样本分类器（作为评分函数选择更接近支持集的分割结果）来克服上述局限。所提模型专为嵌入抓取合成流程而设计。使用单样本或五幅示例的广泛实验表明，本创新方法突破了现有性能瓶颈：在Graspnet-1B数据集上少样本语义分割平均交并比提升10.5%，在Ocid-grasp数据集上平均精度提升1.6%，实际场景少样本抓取合成准确率提升21.7%。项目页面见：https://leobarcellona.github.io/showandgrasp.github.io/

相关内容

小样本学习

关注 216

小样本学习（Few-Shot Learning，以下简称 FSL ）用于解决当可用的数据量比较少时，如何提升神经网络的性能。在 FSL 中，经常用到的一类方法被称为 Meta-learning。和普通的神经网络的训练方法一样，Meta-learning 也包含训练过程和测试过程，但是它的训练过程被称作 Meta-training 和 Meta-testing。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日