Existing works on reasoning segmentation either connect hidden features from a language model directly to a mask decoder or represent positions in text, which limits interpretability and semantic detail. To solve this, we present CoPRS, a Multi-modal Chain-of-Thought (MCoT)-based positional perception model that bridges language reasoning to segmentation through a differentiable and interpretable positional prior instantiated as a heatmap. By making the reasoning process clear via MCoT and expressing it as a dense, differentiable heatmap, this interface enhances interpretability and diagnostic analysis and yields more concentrated evidence on the target. A learnable concentration token aggregates features of the image and reasoning text to generate this positional prior, which is decoded to precise masks through a lightweight decoder, providing a direct connection between reasoning and segmentation. Across the RefCOCO series and ReasonSeg, CoPRS matches or surpasses the best reported metrics on each standard split under comparable protocols, with performance at or above the prior state of the art across both validation and test partitions. Extensive experiments demonstrate a strong positive correlation among the CoT trajectory, the generated heatmap, and the decoded mask, supporting an interpretable alignment between the reasoning output and downstream mask generation. Collectively, these findings support the utility of this paradigm in bridging reasoning and segmentation and show advantages in concentration driven by reasoning and in more precise mask prediction. Code has been released at https://github.com/ZhenyuLU-Heliodore/CoPRS.
翻译:现有推理分割工作要么将语言模型的隐藏特征直接连接至掩码解码器,要么以文本形式表示位置,这限制了可解释性与语义细节。为解决此问题,我们提出CoPRS——一种基于多模态思维链(MCoT)的位置感知模型,通过可微分且可解释的位置先验(以热力图形式实例化)将语言推理与分割相衔接。通过MCoT使推理过程清晰化,并将其表达为密集可微的热力图,该接口增强了可解释性与诊断分析能力,并在目标上生成更集中的证据。一个可学习的集中标记(concentration token)聚合图像与推理文本的特征以生成该位置先验,并通过轻量级解码器解码为精确掩码,为推理与分割之间提供直接连接。在RefCOCO系列与ReasonSeg数据集上,CoPRS在可比协议下各标准划分中达到或超越已有最佳指标,且在验证与测试分区中性能均达到或超越先前最优水平。大量实验证实,CoT轨迹、生成热力图与解码掩码之间存在强正相关,支持推理输出与下游掩码生成之间的可解释对齐。综上,这些发现验证了该范式在衔接推理与分割中的实用性,并展示了其在推理驱动的集中性及更精确掩码预测方面的优势。代码已开源至https://github.com/ZhenyuLU-Heliodore/CoPRS。