Medical image segmentation is increasingly reliant on deep learning techniques, yet the promising performance often come with high annotation costs. This paper introduces Weak-Mamba-UNet, an innovative weakly-supervised learning (WSL) framework that leverages the capabilities of Convolutional Neural Network (CNN), Vision Transformer (ViT), and the cutting-edge Visual Mamba (VMamba) architecture for medical image segmentation, especially when dealing with scribble-based annotations. The proposed WSL strategy incorporates three distinct architecture but same symmetrical encoder-decoder networks: a CNN-based UNet for detailed local feature extraction, a Swin Transformer-based SwinUNet for comprehensive global context understanding, and a VMamba-based Mamba-UNet for efficient long-range dependency modeling. The key concept of this framework is a collaborative and cross-supervisory mechanism that employs pseudo labels to facilitate iterative learning and refinement across the networks. The effectiveness of Weak-Mamba-UNet is validated on a publicly available MRI cardiac segmentation dataset with processed scribble annotations, where it surpasses the performance of a similar WSL framework utilizing only UNet or SwinUNet. This highlights its potential in scenarios with sparse or imprecise annotations. The source code is made publicly accessible.
翻译:医学图像分割日益依赖深度学习技术,但高精度的表现往往伴随高昂的标注成本。本文提出Weak-Mamba-UNet这一创新的弱监督学习框架,融合卷积神经网络、视觉Transformer与前沿视觉Mamba架构的优势,专门针对涂鸦标注下的医学图像分割任务。所提弱监督策略包含三种架构不同但编码器-解码器结构对称的网络:基于CNN的UNet用于精细局部特征提取、基于Swin Transformer的SwinUNet用于全局上下文理解、以及基于VMamba的Mamba-UNet用于高效长程依赖建模。该框架的核心是协同交叉监督机制,通过伪标签实现跨网络的迭代学习与特征精化。在公开MRI心脏分割数据集及其处理后的涂鸦标注上验证了Weak-Mamba-UNet的有效性,其性能超越仅使用UNet或SwinUNet的同类弱监督框架,凸显了其在稀疏或不精确标注场景中的潜力。源代码已公开发布。