Mamba-UNet: UNet-Like Pure Visual Mamba for Medical Image Segmentation

In recent advancements in medical image analysis, Convolutional Neural Networks (CNN) and Vision Transformers (ViT) have set significant benchmarks. While the former excels in capturing local features through its convolution operations, the latter achieves remarkable global context understanding by leveraging self-attention mechanisms. However, both architectures exhibit limitations in efficiently modeling long-range dependencies within medical images, which is a critical aspect for precise segmentation. Inspired by the Mamba architecture, known for its proficiency in handling long sequences and global contextual information with enhanced computational efficiency as a State Space Model (SSM), we propose Mamba-UNet, a novel architecture that synergizes the U-Net in medical image segmentation with Mamba's capability. Mamba-UNet adopts a pure Visual Mamba (VMamba)-based encoder-decoder structure, infused with skip connections to preserve spatial information across different scales of the network. This design facilitates a comprehensive feature learning process, capturing intricate details and broader semantic contexts within medical images. We introduce a novel integration mechanism within the VMamba blocks to ensure seamless connectivity and information flow between the encoder and decoder paths, enhancing the segmentation performance. We conducted experiments on publicly available ACDC MRI Cardiac segmentation dataset, and Synapse CT Abdomen segmentation dataset. The results show that Mamba-UNet outperforms several types of UNet in medical image segmentation under the same hyper-parameter setting. The source code and baseline implementations are available.

翻译：在医学图像分析的最新进展中，卷积神经网络（CNN）和视觉Transformer（ViT）确立了重要基准。前者通过卷积操作在捕捉局部特征方面表现优异，而后者利用自注意力机制实现了显著的全局上下文理解。然而，这两种架构在高效建模医学图像中的长距离依赖关系方面均存在局限性，而这正是精准分割的关键要素。受Mamba架构（一种以处理长序列和全局上下文信息见长、且作为状态空间模型（SSM）具有更高计算效率的架构）的启发，我们提出了Mamba-UNet，这是一种将医学图像分割中的U-Net与Mamba能力协同结合的新型架构。Mamba-UNet采用基于纯视觉Mamba（VMamba）的编码器-解码器结构，并融入跳跃连接以保留网络不同尺度间的空间信息。该设计促进了全面的特征学习过程，能捕捉医学图像中的复杂细节与更广泛的语义上下文。我们在VMamba模块中引入了一种新型集成机制，以确保编码器与解码器路径之间的无缝连接与信息流动，从而提升分割性能。我们在公开可用的ACDC MRI心脏分割数据集和Synapse CT腹部分割数据集上进行了实验。结果表明，在相同超参数设置下，Mamba-UNet在医学图像分割任务中优于多种类型的UNet。源代码及基线实现已公开。