Completely occluded and dense object instance segmentation (IS) is an important and challenging task. Although current amodal IS methods can predict invisible regions of occluded objects, they are difficult to directly predict completely occluded objects. For dense object IS, existing box-based methods are overly dependent on the performance of bounding box detection. In this paper, we propose CFNet, a coarse-to-fine IS framework for completely occluded and dense objects, which is based on box prompt-based segmentation foundation models (BSMs). Specifically, CFNet first detects oriented bounding boxes (OBBs) to distinguish instances and provide coarse localization information. Then, it predicts OBB prompt-related masks for fine segmentation. To predict completely occluded object instances, CFNet performs IS on occluders and utilizes prior geometric properties, which overcomes the difficulty of directly predicting completely occluded object instances. Furthermore, based on BSMs, CFNet reduces the dependence on bounding box detection performance, improving dense object IS performance. Moreover, we propose a novel OBB prompt encoder for BSMs. To make CFNet more lightweight, we perform knowledge distillation on it and introduce a Gaussian smoothing method for teacher targets. Experimental results demonstrate that CFNet achieves the best performance on both industrial and publicly available datasets.
翻译:完全遮挡与密集物体实例分割是一项重要且具有挑战性的任务。尽管当前的模态补全分割方法能够预测被遮挡物体的不可见区域,但难以直接预测完全遮挡的物体。对于密集物体实例分割,现有基于边界框的方法过度依赖检测性能。本文提出CFNet——一种基于框提示分割基础模型的粗到细实例分割框架,专门针对完全遮挡和密集物体。具体而言,CFNet首先检测定向边界框以区分实例并提供粗略定位信息,随后预测与定向边界框提示相关的掩膜进行精细分割。为预测完全遮挡的物体实例,CFNet对遮挡物执行实例分割并利用先验几何特性,克服了直接预测完全遮挡实例的困难。此外,基于分割基础模型,CFNet降低了对边界框检测性能的依赖,提升了密集物体实例分割性能。同时,我们为分割基础模型设计了新型定向边界框提示编码器。为轻量化CFNet,我们对其进行知识蒸馏,并为教师目标引入高斯平滑方法。实验结果表明,CFNet在工业数据集和公开数据集上均取得了最优性能。