Medical image segmentation is essential in diagnostics, treatment planning, and healthcare, with deep learning offering promising advancements. Notably, Convolutional Neural Network (CNN) excel in capturing local image features, whereas Vision Transformer (ViT) adeptly model long-range dependencies through multi-head self-attention mechanisms. Despite their strengths, both CNN and ViT face challenges in efficiently processing long-range dependencies within medical images, often requiring substantial computational resources. This issue, combined with the high cost and limited availability of expert annotations, poses significant obstacles to achieving precise segmentation. To address these challenges, this paper introduces the Semi-Mamba-UNet, which integrates a visual mamba-based UNet architecture with a conventional UNet into a semi-supervised learning (SSL) framework. This innovative SSL approach leverages dual networks to jointly generate pseudo labels and cross supervise each other, drawing inspiration from consistency regularization techniques. Furthermore, we introduce a self-supervised pixel-level contrastive learning strategy, employing a projector pair to further enhance feature learning capabilities. Our comprehensive evaluation on a publicly available MRI cardiac segmentation dataset, comparing against various SSL frameworks with different UNet-based segmentation networks, highlights the superior performance of Semi-Mamba-UNet. The source code has been made publicly accessible.
翻译:医学图像分割在诊断、治疗规划及医疗保健中至关重要,深度学习为此提供了重要进展。具体而言,卷积神经网络擅长捕捉局部图像特征,而视觉Transformer通过多头自注意力机制能够有效建模长距离依赖关系。尽管CNN和ViT各具优势,但两者在高效处理医学图像中的长距离依赖时仍面临挑战,往往需要大量计算资源。这一问题与专家标注的高成本及有限可用性相结合,对实现精确分割构成了重大障碍。为解决这些挑战,本文提出Semi-Mamba-UNet,该模型将基于视觉Mamba的UNet架构与传统UNet集成至半监督学习框架中。这一创新性半监督学习方法受一致性正则化技术启发,利用双网络联合生成伪标签并实现交叉监督。此外,我们引入自监督像素级对比学习策略,通过投影器对进一步增强特征学习能力。我们在公开可用的MRI心脏分割数据集上进行了全面评估,与基于不同UNet分割网络的各种半监督学习框架进行对比,结果表明Semi-Mamba-UNet具有优越性能。源代码已公开发布。