Medical image segmentation demands the aggregation of global and local feature representations, posing a challenge for current methodologies in handling both long-range and short-range feature interactions. Recently, vision mamba (ViM) models have emerged as promising solutions for addressing model complexities by excelling in long-range feature iterations with linear complexity. However, existing ViM approaches overlook the importance of preserving short-range local dependencies by directly flattening spatial tokens and are constrained by fixed scanning patterns that limit the capture of dynamic spatial context information. To address these challenges, we introduce a simple yet effective method named context clustering ViM (CCViM), which incorporates a context clustering module within the existing ViM models to segment image tokens into distinct windows for adaptable local clustering. Our method effectively combines long-range and short-range feature interactions, thereby enhancing spatial contextual representations for medical image segmentation tasks. Extensive experimental evaluations on diverse public datasets, i.e., Kumar, CPM17, ISIC17, ISIC18, and Synapse demonstrate the superior performance of our method compared to current state-of-the-art methods. Our code can be found at https://github.com/zymissy/CCViM.
翻译:医学图像分割需要聚合全局与局部特征表示,这对现有方法在同时处理长程与短程特征交互方面提出了挑战。最近,视觉Mamba(ViM)模型通过以线性复杂度实现优异的长程特征迭代,成为解决模型复杂性的有前景方案。然而,现有ViM方法直接展平空间标记,忽视了保持短程局部依赖性的重要性,并且受限于固定的扫描模式,限制了动态空间上下文信息的捕获。为解决这些挑战,我们提出了一种简单而有效的方法,称为上下文聚类ViM(CCViM),该方法在现有ViM模型中引入了一个上下文聚类模块,将图像标记分割为不同窗口以实现自适应的局部聚类。我们的方法有效结合了长程与短程特征交互,从而增强了医学图像分割任务的空间上下文表示。在多个公共数据集(即Kumar、CPM17、ISIC17、ISIC18和Synapse)上进行的大量实验评估表明,与当前最先进方法相比,我们的方法具有优越性能。我们的代码可在https://github.com/zymissy/CCViM找到。