Deep learning, especially convolutional neural networks (CNNs) and Transformer architectures, have become the focus of extensive research in medical image segmentation, achieving impressive results. However, CNNs come with inductive biases that limit their effectiveness in more complex, varied segmentation scenarios. Conversely, while Transformer-based methods excel at capturing global and long-range semantic details, they suffer from high computational demands. In this study, we propose CSWin-UNet, a novel U-shaped segmentation method that incorporates the CSWin self-attention mechanism into the UNet to facilitate horizontal and vertical stripes self-attention. This method significantly enhances both computational efficiency and receptive field interactions. Additionally, our innovative decoder utilizes a content-aware reassembly operator that strategically reassembles features, guided by predicted kernels, for precise image resolution restoration. Our extensive empirical evaluations on diverse datasets, including synapse multi-organ CT, cardiac MRI, and skin lesions, demonstrate that CSWin-UNet maintains low model complexity while delivering high segmentation accuracy.
翻译:深度学习,特别是卷积神经网络(CNN)和Transformer架构,已成为医学图像分割领域广泛研究的焦点,并取得了令人瞩目的成果。然而,CNN固有的归纳偏置限制了其在更复杂、多变的场景中的分割效能。相比之下,基于Transformer的方法虽擅长捕捉全局及长程语义细节,却存在计算需求高的问题。本研究提出CSWin-UNet,一种新颖的U形分割方法,它将CSWin自注意力机制集成到UNet中,以实现水平和垂直条纹的自注意力计算。该方法显著提升了计算效率与感受野交互能力。此外,我们创新的解码器采用了一种内容感知重组算子,该算子通过预测的核进行引导,策略性地重组特征,以实现精确的图像分辨率恢复。我们在多个数据集(包括Synapse多器官CT、心脏MRI和皮肤病变数据集)上进行的广泛实证评估表明,CSWin-UNet在保持较低模型复杂度的同时,能够提供较高的分割精度。