Precise segmentation of medical images is fundamental for extracting critical clinical information, which plays a pivotal role in enhancing the accuracy of diagnoses, formulating effective treatment plans, and improving patient outcomes. Although Convolutional Neural Networks (CNNs) and non-local attention methods have achieved notable success in medical image segmentation, they either struggle to capture long-range spatial dependencies due to their reliance on local features, or face significant computational and feature integration challenges when attempting to address this issue with global attention mechanisms. To overcome existing limitations in medical image segmentation, we propose a novel architecture, Perspective+ Unet. This framework is characterized by three major innovations: (i) It introduces a dual-pathway strategy at the encoder stage that combines the outcomes of traditional and dilated convolutions. This not only maintains the local receptive field but also significantly expands it, enabling better comprehension of the global structure of images while retaining detail sensitivity. (ii) The framework incorporates an efficient non-local transformer block, named ENLTB, which utilizes kernel function approximation for effective long-range dependency capture with linear computational and spatial complexity. (iii) A Spatial Cross-Scale Integrator strategy is employed to merge global dependencies and local contextual cues across model stages, meticulously refining features from various levels to harmonize global and local information. Experimental results on the ACDC and Synapse datasets demonstrate the effectiveness of our proposed Perspective+ Unet. The code is available in the supplementary material.
翻译:医学图像的精确分割是提取关键临床信息的基础,对于提升诊断准确性、制定有效治疗方案以及改善患者预后具有至关重要的作用。尽管卷积神经网络(CNN)与非局部注意力方法在医学图像分割领域已取得显著成功,但它们或因依赖局部特征而难以捕捉长程空间依赖关系,或在尝试通过全局注意力机制解决此问题时面临显著的计算与特征整合挑战。为克服医学图像分割中的现有局限,我们提出了一种新颖架构——Perspective+ Unet。该框架具备三大创新点:(i)在编码器阶段引入双路径策略,融合传统卷积与空洞卷积的结果。这不仅保持了局部感受野,还显著扩展了其范围,从而在保留细节敏感性的同时,更好地理解图像的全局结构。(ii)框架集成了一个高效的非局部Transformer模块,命名为ENLTB,该模块利用核函数近似以线性计算与空间复杂度有效捕获长程依赖关系。(iii)采用空间跨尺度集成器策略,在模型各阶段融合全局依赖与局部上下文线索,精细优化来自不同层次的特征,以协调全局与局部信息。在ACDC与Synapse数据集上的实验结果验证了我们所提Perspective+ Unet的有效性。代码详见补充材料。