Vision Mamba models have been extensively researched in various fields, which address the limitations of previous models by effectively managing long-range dependencies with a linear-time overhead. Several prospective studies have further designed Vision Mamba based on UNet(VM-UNet) for medical image segmentation. These approaches primarily focus on optimizing architectural designs by creating more complex structures to enhance the model's ability to perceive semantic features. In this paper, we propose a simple yet effective approach to improve the model by Dual Self-distillation for VM-UNet (DSVM-UNet) without any complex architectural designs. To achieve this goal, we develop double self-distillation methods to align the features at both the global and local levels. Extensive experiments conducted on the ISIC2017, ISIC2018, and Synapse benchmarks demonstrate that our approach achieves state-of-the-art performance while maintaining computational efficiency. Code is available at https://github.com/RoryShao/DSVM-UNet.git.
翻译:视觉Mamba模型已在多个领域得到广泛研究,其通过以线性时间开销有效管理长程依赖关系,解决了先前模型的局限性。一些前瞻性研究进一步基于UNet设计了用于医学图像分割的视觉Mamba模型(VM-UNet)。这些方法主要侧重于通过构建更复杂的结构来优化架构设计,以增强模型感知语义特征的能力。在本文中,我们提出了一种简单而有效的方法,即通过为VM-UNet引入双重自蒸馏(DSVM-UNet)来改进模型,而无需任何复杂的架构设计。为实现这一目标,我们开发了双重自蒸馏方法,以在全局和局部两个层面对齐特征。在ISIC2017、ISIC2018和Synapse基准数据集上进行的大量实验表明,我们的方法在保持计算效率的同时,实现了最先进的性能。代码可在 https://github.com/RoryShao/DSVM-UNet.git 获取。