Polyp segmentation in colonoscopy is crucial for detecting colorectal cancer. However, it is challenging due to variations in the structure, color, and size of polyps, as well as the lack of clear boundaries with surrounding tissues. Traditional segmentation models based on Convolutional Neural Networks (CNNs) struggle to capture detailed patterns and global context, limiting their performance. Vision Transformer (ViT)-based models address some of these issues but have difficulties in capturing local context and lack strong zero-shot generalization. To this end, we propose the Mamba-guided Segment Anything Model (SAM-Mamba) for efficient polyp segmentation. Our approach introduces a Mamba-Prior module in the encoder to bridge the gap between the general pre-trained representation of SAM and polyp-relevant trivial clues. It injects salient cues of polyp images into the SAM image encoder as a domain prior while capturing global dependencies at various scales, leading to more accurate segmentation results. Extensive experiments on five benchmark datasets show that SAM-Mamba outperforms traditional CNN, ViT, and Adapter-based models in both quantitative and qualitative measures. Additionally, SAM-Mamba demonstrates excellent adaptability to unseen datasets, making it highly suitable for real-time clinical use.
翻译:结肠镜检查中的息肉分割对于结直肠癌检测至关重要。然而,由于息肉结构、颜色和大小的多样性,以及与周围组织缺乏清晰边界,该任务极具挑战性。基于卷积神经网络(CNN)的传统分割模型难以捕捉细节模式和全局上下文,限制了其性能。基于视觉Transformer(ViT)的模型解决了部分问题,但在捕捉局部上下文方面存在困难,且缺乏强大的零样本泛化能力。为此,我们提出用于高效息肉分割的Mamba引导的Segment Anything模型(SAM-Mamba)。我们的方法在编码器中引入Mamba-Prior模块,以弥合SAM通用预训练表示与息肉相关关键线索之间的差距。该模块将息肉图像的显著线索作为领域先验注入SAM图像编码器,同时捕获多尺度全局依赖关系,从而获得更精确的分割结果。在五个基准数据集上的大量实验表明,SAM-Mamba在定量和定性指标上均优于传统CNN、ViT及基于Adapter的模型。此外,SAM-Mamba对未见数据集展现出优异的适应能力,使其非常适用于实时临床应用。