MedMamba: Vision Mamba for Medical Image Classification

Medical image classification is a very fundamental and crucial task in the field of computer vision. These years, CNN-based and Transformer-based models are widely used in classifying various medical images. Unfortunately, The limitation of CNNs in long-range modeling capabilities prevent them from effectively extracting fine-grained features in medical images , while Transformers are hampered by their quadratic computational complexity. Recent research has shown that the state space model (SSM) represented by Mamba can efficiently model long-range interactions while maintaining linear computational complexity. Inspired by this, we propose Vision Mamba for medical image classification (MedMamba). More specifically, we introduce a novel Conv-SSM module, which combines the local feature extraction ability of convolutional layers with the ability of SSM to capture long-range dependency. To demonstrate the potential of MedMamba, we conduct extensive experiments using three publicly available medical datasets with different imaging techniques (i.e., Kvasir (endoscopic images), FETAL_PLANES_DB (ultrasound images) and Covid19-Pneumonia-Normal Chest X-Ray (X-ray images)) and two private datasets built by ourselves. Experimental results show that the proposed MedMamba performs well in detecting lesions in various medical images. To the best of our knowledge, this is the first Vision Mamba tailored for medical image classification. The purpose of this work is to establish a new baseline for medical image classification tasks and provide valuable insights for the future development of more efficient and effective SSM-based artificial intelligence algorithms and application systems in the medical. Source code has been available at https://github.com/YubiaoYue/MedMamba.

翻译：医学图像分类是计算机视觉领域中一项非常基础且关键的任务。近年来，基于CNN和Transformer的模型被广泛用于各类医学图像的分类。然而，CNN在长程建模能力上的局限性使其难以有效提取医学图像中的细粒度特征，而Transformer则受限于其二次计算复杂度。最新研究表明，以Mamba为代表的状态空间模型（SSM）能够在保持线性计算复杂度的同时高效建模长程交互。受此启发，我们提出了用于医学图像分类的视觉Mamba模型（MedMamba）。具体而言，我们引入了一种新型Conv-SSM模块，该模块融合了卷积层的局部特征提取能力与SSM捕捉长程依赖关系的能力。为验证MedMamba的潜力，我们利用三个公开医学数据集（即Kvasir（内镜图像）、FETAL_PLANES_DB（超声图像）和Covid19-Pneumonia-Normal Chest X-Ray（X光图像））以及两个自建私有数据集开展了广泛实验。实验结果表明，所提出的MedMamba在检测各类医学图像病灶方面表现优异。据我们所知，这是首个专为医学图像分类定制的视觉Mamba模型。本文旨在为医学图像分类任务建立新的基线，并为未来在医学领域开发更高效、更有效的基于SSM的人工智能算法及应用系统提供有价值的见解。源代码已公开于https://github.com/YubiaoYue/MedMamba。