Medical image classification is a very fundamental and crucial task in the field of computer vision. These years, CNN-based and Transformer-based models have been widely used to classify various medical images. Unfortunately, The limitation of CNNs in long-range modeling capabilities prevents them from effectively extracting features in medical images, while Transformers are hampered by their quadratic computational complexity. Recent research has shown that the state space model (SSM) represented by Mamba can efficiently model long-range interactions while maintaining linear computational complexity. Inspired by this, we propose Vision Mamba for medical image classification (MedMamba). More specifically, we introduce a novel Conv-SSM module. Conv-SSM combines the local feature extraction ability of convolutional layers with the ability of SSM to capture long-range dependency, thereby modeling medical images with different modalities. To demonstrate the potential of MedMamba, we conducted extensive experiments using 14 publicly available medical datasets with different imaging techniques and two private datasets built by ourselves. Extensive experimental results demonstrate that the proposed MedMamba performs well in detecting lesions in various medical images. To the best of our knowledge, this is the first Vision Mamba tailored for medical image classification. The purpose of this work is to establish a new baseline for medical image classification tasks and provide valuable insights for the future development of more efficient and effective SSM-based artificial intelligence algorithms and application systems in the medical. Source code has been available at https://github.com/YubiaoYue/MedMamba.
翻译:医学图像分类是计算机视觉领域中一项基础且至关重要的任务。近年来,基于卷积神经网络(CNN)和Transformer的模型被广泛用于各类医学图像的分类。然而,CNN在长程建模能力上的局限性使其难以有效提取医学图像特征,而Transformer则受限于其二次方计算复杂度。最新研究表明,以Mamba为代表的状态空间模型(SSM)能够在保持线性计算复杂度的同时高效建模长程交互。受此启发,我们提出面向医学图像分类的Vision Mamba模型(MedMamba)。具体而言,我们设计了一种新颖的Conv-SSM模块,该模块融合了卷积层的局部特征提取能力与SSM捕获长程依赖的能力,从而实现对不同模态医学图像的建模。为验证MedMamba的潜力,我们基于14个采用不同成像技术的公开医学数据集以及两个自行构建的私有数据集开展了大量实验。广泛实验结果表明,所提出的MedMamba在检测各类医学图像病灶方面表现优异。据我们所知,这是首个专为医学图像分类设计的Vision Mamba模型。本工作旨在为医学图像分类任务建立新基准,并为未来开发更高效、更强大的基于SSM的人工智能算法及在医学领域的应用系统提供宝贵见解。源代码已发布在https://github.com/YubiaoYue/MedMamba。