VM-UNet: Vision Mamba UNet for Medical Image Segmentation

In the realm of medical image segmentation, both CNN-based and Transformer-based models have been extensively explored. However, CNNs exhibit limitations in long-range modeling capabilities, whereas Transformers are hampered by their quadratic computational complexity. Recently, State Space Models (SSMs), exemplified by Mamba, have emerged as a promising approach. They not only excel in modeling long-range interactions but also maintain a linear computational complexity. In this paper, leveraging state space models, we propose a U-shape architecture model for medical image segmentation, named Vision Mamba UNet (VM-UNet). Specifically, the Visual State Space (VSS) block is introduced as the foundation block to capture extensive contextual information, and an asymmetrical encoder-decoder structure is constructed with fewer convolution layers to save calculation cost. We conduct comprehensive experiments on the ISIC17, ISIC18, and Synapse datasets, and the results indicate that VM-UNet performs competitively in medical image segmentation tasks. To our best knowledge, this is the first medical image segmentation model constructed based on the pure SSM-based model. We aim to establish a baseline and provide valuable insights for the future development of more efficient and effective SSM-based segmentation systems. Our code is available at https://github.com/JCruan519/VM-UNet.

翻译：在医学图像分割领域，基于CNN和基于Transformer的模型均已得到广泛探索。然而，CNN在长程建模能力方面存在局限，而Transformer则受限于其二次计算复杂度。近年来，以Mamba为代表的状态空间模型（SSMs）已成为一种前景广阔的方法。它们不仅在建模长程交互方面表现优异，还能保持线性计算复杂度。本文利用状态空间模型，提出了一种用于医学图像分割的U形架构模型，命名为视觉Mamba UNet（VM-UNet）。具体而言，我们引入视觉状态空间（VSS）模块作为基础构建块以捕获广泛的上下文信息，并构建了包含较少卷积层的不对称编码器-解码器结构以节省计算成本。我们在ISIC17、ISIC18和Synapse数据集上进行了全面实验，结果表明VM-UNet在医学图像分割任务中具有竞争力。据我们所知，这是首个基于纯SSM模型构建的医学图像分割模型。我们旨在为此建立一个基线，并为未来开发更高效、更有效的基于SSM的分割系统提供有价值的见解。我们的代码公开于https://github.com/JCruan519/VM-UNet。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日