How can we segment varying numbers of objects where each specific object represents its own separate class? To make the problem even more realistic, how can we add and delete classes on the fly without retraining? This is the case of robotic applications where no datasets of the objects exist or application that includes thousands of objects (E.g., in logistics) where it is impossible to train a single model to learn all of the objects. Most current research on object segmentation for robotic grasping focuses on class-level object segmentation (E.g., box, cup, bottle), closed sets (specific objects of a dataset; for example, YCB dataset), or deep learning-based template matching. In this work, we are interested in open sets where the number of classes is unknown, varying, and without pre-knowledge about the objects' types. We consider each specific object as its own separate class. Our goal is to develop a zero-shot object detector that requires no training and can add any object as a class just by capturing a few images of the object. Our main idea is to break the segmentation pipelines into two steps by combining unseen object segmentation networks cascaded by zero-shot classifiers. We evaluate our zero-shot object detector on unseen datasets and compare it to a trained Mask R-CNN on those datasets. The results show that the performance varies from practical to unsuitable depending on the environment setup and the objects being handled. The code is available in our DoUnseen library repository.
翻译:如何对数量变化的物体进行分割,其中每个特定物体代表其独立的类别?若进一步考虑现实性,如何在不重新训练的情况下动态添加或删除类别?这在机器人应用中尤为常见,例如当物体数据集不存在时,或应用涉及数千个物体(如物流场景),此时无法训练单一模型来学习所有物体。当前机器人抓取中物体分割的研究大多聚焦于类别级分割(如盒子、杯子、瓶子)、封闭数据集(如YCB数据集中的特定物体),或基于深度学习的模板匹配。本工作关注开放集合场景:类别数量未知、动态变化,且无物体类型的先验知识。我们将每个特定物体视为独立类别,旨在开发一种无需训练的零样本目标检测器,仅需通过拍摄少量物体图像即可将其作为新类别添加。核心思路是将分割流程解耦为两步:通过未见过物体分割网络与零样本分类器级联。我们在未见数据集上评估该零样本目标检测器,并与在该数据集上训练的Mask R-CNN进行对比。结果表明,其性能因环境配置与操作物体的不同,从实际可用到不适用不等。代码已开源至DoUnseen库仓库。