Single-point annotation in visual tasks, with the goal of minimizing labelling costs, is becoming increasingly prominent in research. Recently, visual foundation models, such as Segment Anything (SAM), have gained widespread usage due to their robust zero-shot capabilities and exceptional annotation performance. However, SAM's class-agnostic output and high confidence in local segmentation introduce 'semantic ambiguity', posing a challenge for precise category-specific segmentation. In this paper, we introduce a cost-effective category-specific segmenter using SAM. To tackle this challenge, we have devised a Semantic-Aware Instance Segmentation Network (SAPNet) that integrates Multiple Instance Learning (MIL) with matching capability and SAM with point prompts. SAPNet strategically selects the most representative mask proposals generated by SAM to supervise segmentation, with a specific focus on object category information. Moreover, we introduce the Point Distance Guidance and Box Mining Strategy to mitigate inherent challenges: 'group' and 'local' issues in weakly supervised segmentation. These strategies serve to further enhance the overall segmentation performance. The experimental results on Pascal VOC and COCO demonstrate the promising performance of our proposed SAPNet, emphasizing its semantic matching capabilities and its potential to advance point-prompted instance segmentation. The code will be made publicly available.
翻译:在视觉任务中,以最小化标注成本为目标的单点标注在研究领域日益突出。近期,诸如Segment Anything (SAM) 等视觉基础模型因其强大的零样本能力和卓越的标注性能而得到广泛应用。然而,SAM的类别无关输出以及对局部分割的高置信度带来了"语义模糊性",这给精确的类别特定分割带来了挑战。本文提出了一种基于SAM的高性价比类别特定分割器。为解决这一难题,我们设计了语义感知实例分割网络(SAPNet),该网络整合了具备匹配能力的多实例学习(MIL)和基于点提示的SAM。SAPNet策略性地选取SAM生成的最具代表性的掩码提案来监督分割过程,并重点关注对象类别信息。此外,我们引入了点距离引导与边界框挖掘策略以缓解弱监督分割中固有的"群体"和"局部"问题。这些策略进一步提升了整体分割性能。在Pascal VOC和COCO数据集上的实验结果表明,我们提出的SAPNet具有令人满意的性能,突显了其语义匹配能力及其在推动点提示实例分割方面的潜力。相关代码将公开提供。