MedMamba: Vision Mamba for Medical Image Classification

Since the era of deep learning, convolutional neural networks (CNNs) and vision transformers (ViTs) have been extensively studied and widely used in medical image classification tasks. Unfortunately, CNN's limitations in modeling long-range dependencies result in poor classification performances. In contrast, ViTs are hampered by the quadratic computational complexity of their self-attention mechanism, making them difficult to deploy in real-world settings with limited computational resources. Recent studies have shown that state space models (SSMs) represented by Mamba can effectively model long-range dependencies while maintaining linear computational complexity. Inspired by it, we proposed MedMamba, the first vision Mamba for generalized medical image classification. Concretely, we introduced a novel hybrid basic block named SS-Conv-SSM, which integrates the convolutional layers for extracting local features with the abilities of SSM to capture long-range dependencies, aiming to model medical images from different image modalities efficiently. By employing the grouped convolution strategy and channel-shuffle operation, MedMamba successfully provides fewer model parameters and a lower computational burden for efficient applications. To demonstrate the potential of MedMamba, we conducted extensive experiments using 16 datasets containing ten imaging modalities and 411,007 images. Experimental results show that the proposed MedMamba demonstrates competitive performance in classifying various medical images compared with the state-of-the-art methods. Our work is aims to establish a new baseline for medical image classification and provide valuable insights for developing more powerful SSM-based artificial intelligence algorithms and application systems in the medical field. The source codes and all pre-trained weights of MedMamba are available at https://github.com/YubiaoYue/MedMamba.

翻译：自深度学习时代以来，卷积神经网络（CNN）和视觉Transformer（ViT）在医学图像分类任务中得到了广泛研究和应用。然而，CNN在建模长程依赖关系方面存在局限性，导致分类性能不佳。相比之下，ViT因其自注意力机制的二次方计算复杂度而受到制约，难以部署在计算资源受限的实际场景中。近期研究表明，以Mamba为代表的状态空间模型（SSM）能在保持线性计算复杂度的同时有效建模长程依赖关系。受此启发，我们提出MedMamba——首个面向通用医学图像分类的Vision Mamba。具体而言，我们引入了一种名为SS-Conv-SSM的新型混合基础模块，该模块将用于提取局部特征的卷积层与捕捉长程依赖关系的SSM能力相结合，旨在高效建模来自不同成像模态的医学图像。通过采用分组卷积策略和通道混洗操作，MedMamba成功实现了更少的模型参数和更低的计算负担，从而支持高效应用。为展示MedMamba的潜力，我们使用包含10种成像模态和411,007张图像的16个数据集进行了广泛实验。实验结果表明，与最先进方法相比，所提出的MedMamba在多种医学图像分类任务中展现出具有竞争力的性能。我们的工作旨在为医学图像分类建立新基准，并为开发医疗领域更强大的基于SSM的人工智能算法与应用系统提供宝贵见解。MedMamba的源代码及所有预训练权重已开源至https://github.com/YubiaoYue/MedMamba。