Medical image segmentation takes an important position in various clinical applications. Deep learning has emerged as the predominant solution for automated segmentation of volumetric medical images. 2.5D-based segmentation models bridge computational efficiency of 2D-based models and spatial perception capabilities of 3D-based models. However, prevailing 2.5D-based models often treat each slice equally, failing to effectively learn and exploit inter-slice information, resulting in suboptimal segmentation performances. In this paper, a novel Momentum encoder-based inter-slice fusion transformer (MOSformer) is proposed to overcome this issue by leveraging inter-slice information at multi-scale feature maps extracted by different encoders. Specifically, dual encoders are employed to enhance feature distinguishability among different slices. One of the encoders is moving-averaged to maintain the consistency of slice representations. Moreover, an IF-Swin transformer module is developed to fuse inter-slice multi-scale features. The MOSformer is evaluated on three benchmark datasets (Synapse, ACDC, and AMOS), establishing a new state-of-the-art with 85.63%, 92.19%, and 85.43% of DSC, respectively. These promising results indicate its competitiveness in medical image segmentation. Codes and models of MOSformer will be made publicly available upon acceptance.
翻译:医学图像分割在各类临床应用中占据重要地位。深度学习已成为体素化医学图像自动分割的主流解决方案。2.5D分割模型兼顾了2D模型的计算效率与3D模型的空间感知能力。然而,现有2.5D模型常将各切片平等对待,未能有效学习与利用切片间信息,导致分割性能欠佳。本文提出了一种基于动量编码器的切片间融合Transformer(MOSformer),通过利用不同编码器提取的多尺度特征图中的切片间信息来解决该问题。具体而言,采用双编码器增强不同切片间的特征区分度,其中一个编码器通过移动平均保持切片表征一致性。此外,开发了IF-Swin Transformer模块用于融合切片间多尺度特征。在三个基准数据集(Synapse、ACDC和AMOS)上的评估显示,MOSformer分别达到85.63%、92.19%和85.43%的DSC,创下新的最优性能。这些结果表明了该方法在医学图像分割中的竞争力。MOSformer的代码与模型将在论文接收后公开。