MCPA: Multi-scale Cross Perceptron Attention Network for 2D Medical Image Segmentation

The UNet architecture, based on Convolutional Neural Networks (CNN), has demonstrated its remarkable performance in medical image analysis. However, it faces challenges in capturing long-range dependencies due to the limited receptive fields and inherent bias of convolutional operations. Recently, numerous transformer-based techniques have been incorporated into the UNet architecture to overcome this limitation by effectively capturing global feature correlations. However, the integration of the Transformer modules may result in the loss of local contextual information during the global feature fusion process. To overcome these challenges, we propose a 2D medical image segmentation model called Multi-scale Cross Perceptron Attention Network (MCPA). The MCPA consists of three main components: an encoder, a decoder, and a Cross Perceptron. The Cross Perceptron first captures the local correlations using multiple Multi-scale Cross Perceptron modules, facilitating the fusion of features across scales. The resulting multi-scale feature vectors are then spatially unfolded, concatenated, and fed through a Global Perceptron module to model global dependencies. Furthermore, we introduce a Progressive Dual-branch Structure to address the semantic segmentation of the image involving finer tissue structures. This structure gradually shifts the segmentation focus of MCPA network training from large-scale structural features to more sophisticated pixel-level features. We evaluate our proposed MCPA model on several publicly available medical image datasets from different tasks and devices, including the open large-scale dataset of CT (Synapse), MRI (ACDC), fundus camera (DRIVE, CHASE_DB1, HRF), and OCTA (ROSE). The experimental results show that our MCPA model achieves state-of-the-art performance. The code is available at https://github.com/simonustc/MCPA-for-2D-Medical-Image-Segmentation.

翻译：基于卷积神经网络的UNet架构在医学图像分析中展现了卓越性能。然而，由于卷积运算固有的感受野局限和归纳偏置，该架构在捕捉长距离依赖关系方面面临挑战。为克服这一局限，近期大量基于Transformer的技术被融入UNet架构，通过有效捕获全局特征相关性来改进性能。但Transformer模块的集成可能导致全局特征融合过程中局部上下文信息的丢失。为解决这些问题，我们提出了一种名为多尺度交叉感知注意力网络的二维医学图像分割模型（MCPA）。MCPA由编码器、解码器和交叉感知器三个核心组件构成。交叉感知器首先利用多个多尺度交叉感知模块捕获局部相关性，促进跨尺度特征融合；随后将生成的多尺度特征向量进行空间展开、拼接，并输入全局感知器模块以建模全局依赖关系。此外，我们引入渐进式双分支结构处理涉及精细组织结构的语义分割任务，该结构使MCPA网络的训练重心从大尺度结构特征逐步转移至更精细的像素级特征。我们在多个公开的医学图像数据集（包括来自不同设备和任务的大规模CT数据集Synapse、MRI数据集ACDC、眼底相机数据集DRIVE/CHASE_DB1/HRF以及OCTA数据集ROSE）上对提出的MCPA模型进行验证。实验结果表明，MCPA模型达到了当前最优性能。代码已在https://github.com/simonustc/MCPA-for-2D-Medical-Image-Segmentation 公开。