Semi-supervised medical image segmentation has emerged as a dominant research problem in medical image analysis, mitigating annotation scarcity by leveraging consistency regularization on unlabeled data. However, existing approaches operate predominantly via visual pattern matching, relying heavily on pixel-level similarities. This visual-centric dependency often falters in clinical scenarios characterized by the visual-semantic mismatch, where visually similar lesions warrant distinct diagnostic conclusions, thus failing to capture the underlying diagnostic logic used by experts. To address this, we move beyond visual cues and propose CERS (CoT-Enhanced Reasoning Segmentation), a framework that integrates Chain-of-Thought (CoT) reasoning to distinguish pathologically distinct cases. Specifically, we construct a knowledge pool enriched with linguistic reasoning descriptions generated by large language models (LLMs). A semantic-aware reference selection strategy is introduced to identify historical evidence, filtering candidates first by morphology, and then refining them via CoT consistency to eliminate hard negatives. Furthermore, a multi-scale coordinate attention module (MCAM) is designed to effectively fuse this reasoning-derived context into the decoding process. Extensive experiments demonstrate the superiority of CERS against state-of-the-art approaches, particularly in resolving boundary ambiguities and semantic inconsistencies. The code is available at https://github.com/cymasuna/CERS.
翻译:半监督医学图像分割已成为医学图像分析中的核心研究问题,通过利用未标注数据上的一致性正则化来缓解标注稀缺问题。然而,现有方法主要依赖视觉模式匹配,过度依赖于像素级相似性。这种以视觉为中心的依赖在临床场景中常因视觉-语义错配而失效——当视觉上相似的病变需要截然不同的诊断结论时,此类方法难以捕捉专家所使用的潜在诊断逻辑。为解决此问题,我们突破视觉线索局限,提出CERS(思维链增强推理分割框架),该框架融合思维链推理以区分病理学上不同的病例。具体而言,我们构建了一个知识库,其中包含由大语言模型生成的富含语言推理描述。引入语义感知参考选择策略以识别历史证据:首先通过形态学筛选候选,再通过思维链一致性进行精炼以消除强负样本。此外,设计多尺度坐标注意力模块,将推理生成的上下文有效融合到解码过程中。大量实验表明,CERS在解决边界模糊性和语义不一致性方面显著优于现有方法,代码已开源:https://github.com/cymasuna/CERS。