Image segmentation holds a vital position in the realms of diagnosis and treatment within the medical domain. Traditional convolutional neural networks (CNNs) and Transformer models have made significant advancements in this realm, but they still encounter challenges because of limited receptive field or high computing complexity. Recently, State Space Models (SSMs), particularly Mamba and its variants, have demonstrated notable performance in the field of vision. However, their feature extraction methods may not be sufficiently effective and retain some redundant structures, leaving room for parameter reduction. Motivated by previous spatial and channel attention methods, we propose Triplet Mamba-UNet. The method leverages residual VSS Blocks to extract intensive contextual features, while Triplet SSM is employed to fuse features across spatial and channel dimensions. We conducted experiments on ISIC17, ISIC18, CVC-300, CVC-ClinicDB, Kvasir-SEG, CVC-ColonDB, and Kvasir-Instrument datasets, demonstrating the superior segmentation performance of our proposed TM-UNet. Additionally, compared to the previous VM-UNet, our model achieves a one-third reduction in parameters.
翻译:图像分割在医学领域的诊断与治疗中占据核心地位。传统卷积神经网络(CNN)和Transformer模型虽在该领域取得显著进展,但因感受野受限或计算复杂度高而仍面临挑战。近期,状态空间模型(SSM),特别是Mamba及其变体,在视觉领域展现出卓越性能。然而,其特征提取方法可能不够高效且保留部分冗余结构,存在参数压缩空间。受现有空间与通道注意力机制的启发,我们提出Triplet Mamba-UNet模型。该方法利用残差VSS模块提取密集上下文特征,同时采用三重SSM模块沿空间和通道维度进行特征融合。我们在ISIC17、ISIC18、CVC-300、CVC-ClinicDB、Kvasir-SEG、CVC-ColonDB和Kvasir-Instrument数据集上开展实验,证明所提出的TM-UNet具有优异的分割性能。此外,与先前的VM-UNet相比,本模型实现了三分之一的参数缩减。