Medical image segmentation, a critical application of semantic segmentation in healthcare, has seen significant advancements through specialized computer vision techniques. While deep learning-based medical image segmentation is essential for assisting in medical diagnosis, the lack of diverse training data causes the long-tail problem. Moreover, most previous hybrid CNN-ViT architectures have limited ability to combine various attentions in different layers of the Convolutional Neural Network. To address these issues, we propose a Lagrange Duality Consistency (LDC) Loss, integrated with Boundary-Aware Contrastive Loss, as the overall training objective for semi-supervised learning to mitigate the long-tail problem. Additionally, we introduce CMAformer, a novel network that synergizes the strengths of ResUNet and Transformer. The cross-attention block in CMAformer effectively integrates spatial attention and channel attention for multi-scale feature fusion. Overall, our results indicate that CMAformer, combined with the feature fusion framework and the new consistency loss, demonstrates strong complementarity in semi-supervised learning ensembles. We achieve state-of-the-art results on multiple public medical image datasets. Example code are available at: \url{https://github.com/lzeeorno/Lagrange-Duality-and-CMAformer}.
翻译:医学图像分割作为语义分割在医疗健康领域的关键应用,通过专门的计算机视觉技术已取得显著进展。尽管基于深度学习的医学图像分割对于辅助医疗诊断至关重要,但训练数据多样性的缺乏导致了长尾问题。此外,先前大多数混合CNN-ViT架构在结合卷积神经网络不同层中的多种注意力机制方面能力有限。为解决这些问题,我们提出拉格朗日对偶一致性损失,并与边界感知对比损失相结合,作为半监督学习的整体训练目标以缓解长尾问题。此外,我们引入了CMAformer这一新颖网络,它协同融合了ResUNet与Transformer的优势。CMAformer中的交叉注意力模块有效整合了空间注意力与通道注意力,实现多尺度特征融合。总体而言,我们的结果表明,CMAformer结合特征融合框架与新型一致性损失,在半监督学习集成中展现出强大的互补性。我们在多个公开医学图像数据集上取得了最先进的结果。示例代码可见于:\url{https://github.com/lzeeorno/Lagrange-Duality-and-CMAformer}。