Medical image segmentation is crucial for diagnosis and treatment planning. Traditional CNN-based models, like U-Net, have shown promising results but struggle to capture long-range dependencies and global context. To address these limitations, we propose a transformer-based architecture that jointly applies Channel Attention and Pyramid Attention mechanisms to improve multi-scale feature extraction and enhance segmentation performance for medical images. Increasing model complexity requires more training data, and we further improve model generalization with CutMix data augmentation. Our approach is evaluated on the Synapse multi-organ segmentation dataset, achieving a 6.9% improvement in Mean Dice score and a 39.9% improvement in Hausdorff Distance (HD95) over an implementation without our enhancements. Our proposed model demonstrates improved segmentation accuracy for complex anatomical structures, outperforming existing state-of-the-art methods.
翻译:医学图像分割对于诊断与治疗规划至关重要。传统的基于CNN的模型(如U-Net)已展现出良好效果,但在捕获长程依赖与全局上下文方面存在局限。为克服这些不足,我们提出一种基于Transformer的架构,该架构联合应用通道注意力与金字塔注意力机制,以改进多尺度特征提取并提升医学图像的分割性能。模型复杂度的增加需要更多训练数据,我们进一步通过CutMix数据增强技术提升了模型的泛化能力。我们在Synapse多器官分割数据集上评估了所提方法,相较于未采用增强机制的基线实现,平均Dice分数提升了6.9%,豪斯多夫距离(HD95)改善了39.9%。实验表明,我们提出的模型在复杂解剖结构的分割精度上显著提升,其性能优于现有先进方法。