Instance segmentation for completely occluded objects and dense objects in robot vision measurement are two challenging tasks. To uniformly deal with them, this paper proposes a unified coarse-to-fine instance segmentation framework, CFNet, which uses box prompt-based segmentation foundation models (BSMs), e.g., Segment Anything Model. Specifically, CFNet first detects oriented bounding boxes (OBBs) to distinguish instances and provide coarse localization information. Then, it predicts OBB prompt-related masks for fine segmentation. CFNet performs instance segmentation with OBBs that only contain partial object boundaries on occluders to predict occluded object instances, which overcomes the difficulty of existing amodal instance segmentation methods in directly predicting occluded objects. In addition, since OBBs only serve as prompts, CFNet alleviates the over-dependence on bounding box detection performance of current instance segmentation methods using OBBs for dense objects. Moreover, to enable BSMs to handle OBB prompts, we propose a novel OBB prompt encoder. To make CFNet more lightweight, we perform knowledge distillation on it and introduce a Gaussian label smoothing method for teacher model outputs. Experiments demonstrate that CFNet outperforms current instance segmentation methods on both industrial and public datasets. The code is available at https://github.com/zhen6618/OBBInstanceSegmentation.
翻译:机器人视觉测量中完全遮挡物体与密集物体的实例分割是两项具有挑战性的任务。为统一处理这两类问题,本文提出了一种统一的从粗到细的实例分割框架CFNet,该框架利用基于框提示的分割基础模型(如Segment Anything Model)。具体而言,CFNet首先检测有向边界框以区分实例并提供粗略定位信息,然后预测与有向边界框提示相关的掩膜以实现精细分割。CFNet通过仅依赖遮挡物上包含部分物体边界的有向边界框来预测被遮挡的物体实例,从而克服了现有非模态实例分割方法直接预测被遮挡物体的困难。此外,由于有向边界框仅作为提示,CFNet缓解了当前使用有向边界框的实例分割方法对密集物体时的边界框检测性能过度依赖问题。为使基础分割模型能够处理有向边界框提示,我们提出了一种新颖的有向边界框提示编码器。同时,为降低CFNet模型复杂度,我们对其进行了知识蒸馏,并针对教师模型输出引入了一种高斯标签平滑方法。实验表明,CFNet在工业数据集和公开数据集上均优于现有实例分割方法。代码已开源:https://github.com/zhen6618/OBBInstanceSegmentation