In recent advancements in medical image analysis, Convolutional Neural Networks (CNN) and Vision Transformers (ViT) have set significant benchmarks. While the former excels in capturing local features through its convolution operations, the latter achieves remarkable global context understanding by leveraging self-attention mechanisms. However, both architectures exhibit limitations in efficiently modeling long-range dependencies within medical images, which is a critical aspect for precise segmentation. Inspired by the Mamba architecture, known for its proficiency in handling long sequences and global contextual information with enhanced computational efficiency as a State Space Model (SSM), we propose Mamba-UNet, a novel architecture that synergizes the U-Net in medical image segmentation with Mamba's capability. Mamba-UNet adopts a pure Visual Mamba (VMamba)-based encoder-decoder structure, infused with skip connections to preserve spatial information across different scales of the network. This design facilitates a comprehensive feature learning process, capturing intricate details and broader semantic contexts within medical images. We introduce a novel integration mechanism within the VMamba blocks to ensure seamless connectivity and information flow between the encoder and decoder paths, enhancing the segmentation performance. We conducted experiments on publicly available MRI cardiac multi-structures segmentation dataset. The results show that Mamba-UNet outperforms UNet, Swin-UNet in medical image segmentation under the same hyper-parameter setting. The source code and baseline implementations are available.
翻译:在医学图像分析的最新进展中,卷积神经网络(CNN)与视觉Transformer(ViT)已树立重要基准。前者通过卷积运算擅长捕捉局部特征,后者则借助自注意力机制实现对全局上下文的深刻理解。然而,这两种架构在高效建模医学图像中长距离依赖关系方面均存在局限,而这恰恰是精确分割的关键要素。受以处理长序列和全局上下文信息见长的Mamba架构(作为状态空间模型(SSM)具有更高计算效率)启发,我们提出Mamba-UNet——一种将医学图像分割中的U-Net与Mamba能力协同融合的新型架构。Mamba-UNet采用基于纯视觉Mamba(VMamba)的编码器-解码器结构,并融入跳跃连接以保留网络不同尺度间的空间信息。该设计促进了综合特征学习过程,能够捕捉医学图像中的精细细节与更广泛的语义上下文。我们在VMamba模块中引入新型集成机制,确保编码器与解码器路径之间的无缝连接与信息流动,从而提升分割性能。我们在公开可用的MRI心脏多结构分割数据集上进行了实验。结果显示,在相同超参数设置下,Mamba-UNet在医学图像分割任务中优于UNet和Swin-UNet。本文提供源代码及基准实现。