Current zero-shot Camouflaged Object Segmentation methods typically employ a two-stage pipeline (discover-then-segment): using MLLMs to obtain visual prompts, followed by SAM segmentation. However, relying solely on MLLMs for camouflaged object discovery often leads to inaccurate localization, false positives, and missed detections. To address these issues, we propose the \textbf{D}iscover-\textbf{S}egment-\textbf{S}elect (\textbf{DSS}) mechanism, a progressive framework designed to refine segmentation step by step. The proposed method contains a Feature-coherent Object Discovery (FOD) module that leverages visual features to generate diverse object proposals, a segmentation module that refines these proposals through SAM segmentation, and a Semantic-driven Mask Selection (SMS) module that employs MLLMs to evaluate and select the optimal segmentation mask from multiple candidates. Without requiring any training or supervision, DSS achieves state-of-the-art performance on multiple COS benchmarks, especially in multiple-instance scenes.
翻译:当前零样本伪装目标分割方法通常采用两阶段流程(先发现后分割):利用多模态大语言模型获取视觉提示,随后通过SAM进行分割。然而,仅依赖多模态大语言模型进行伪装目标发现常导致定位不准确、误检与漏检问题。为解决这些缺陷,我们提出**发现-分割-选择**机制,这是一种逐步优化的渐进式框架。所提方法包含特征一致目标发现模块(利用视觉特征生成多样化目标候选区域)、分割模块(通过SAM细化候选区域)以及语义驱动掩码选择模块(运用多模态大语言模型评估并选择多候选掩码中的最优分割结果)。在无需任何训练或监督的情况下,DSS在多个伪装目标分割基准测试中取得了最先进的性能,尤其在多实例场景中表现突出。