How can we segment varying numbers of objects where each specific object represents its own separate class? To make the problem even more realistic, how can we add and delete classes on the fly without retraining or fine-tuning? This is the case of robotic applications where no datasets of the objects exist or application that includes thousands of objects (E.g., in logistics) where it is impossible to train a single model to learn all of the objects. Most current research on object segmentation for robotic grasping focuses on class-level object segmentation (E.g., box, cup, bottle), closed sets (specific objects of a dataset; for example, YCB dataset), or deep learning-based template matching. In this work, we are interested in open sets where the number of classes is unknown, varying, and without pre-knowledge about the objects' types. We consider each specific object as its own separate class. Our goal is to develop an object detector that requires no fine-tuning and can add any object as a class just by capturing a few images of the object. Our main idea is to break the segmentation pipelines into two steps by combining unseen object segmentation networks cascaded by class-adaptive classifiers. We evaluate our class-adaptive object detector on unseen datasets and compare it to a trained Mask R-CNN on those datasets. The results show that the performance varies from practical to unsuitable depending on the environment setup and the objects being handled. The code is available in our DoUnseen library repository.
翻译:摘要:如何对数量不定的物体进行分割,且每个特定物体代表其独立的类别?如何使问题更加贴近现实,即在不重新训练或微调的情况下动态添加或删除类别?这正对应于机器人应用场景——其中不存在物体的数据集,或涉及数千种物体(例如物流领域),以至于无法训练单一模型来学习所有物体。当前针对机器人抓取的物体分割研究主要集中于类别级物体分割(如盒子、杯子、瓶子)、封闭集(数据集的特定物体,如YCB数据集)或基于深度学习的模板匹配。本研究关注开放集场景,其中类别数量未知、动态变化,且无物体类型的先验知识。我们将每个特定物体视为独立类别,旨在开发一种无需微调、仅通过采集物体少量图像即可将其作为类别添加的物体检测器。核心思路是将分割流程分解为两个步骤:结合未知物体分割网络与级联的类别自适应分类器。我们在未知数据集上评估该类别自适应检测器,并与在这些数据集上训练的Mask R-CNN进行对比。结果表明,性能因环境设置及处理物体的不同而呈现从实用到不适用的差异。代码已开源至我们的DoUnseen库仓库。