Recently, Visual Transformer (ViT) has been extensively used in medical image segmentation (MIS) due to applying self-attention mechanism in the spatial domain to modeling global knowledge. However, many studies have focused on improving models in the spatial domain while neglecting the importance of frequency domain information. Therefore, we propose Multi-axis External Weights UNet (MEW-UNet) based on the U-shape architecture by replacing self-attention in ViT with our Multi-axis External Weights block. Specifically, our block performs a Fourier transform on the three axes of the input features and assigns the external weight in the frequency domain, which is generated by our External Weights Generator. Then, an inverse Fourier transform is performed to change the features back to the spatial domain. We evaluate our model on four datasets, including Synapse, ACDC, ISIC17 and ISIC18 datasets, and our approach demonstrates competitive performance, owing to its effective utilization of frequency domain information.
翻译:近年来,视觉Transformer(ViT)因在空间域中应用自注意力机制进行全局知识建模,已被广泛应用于医学图像分割。然而,大量研究侧重于改进空间域模型,却忽视了频域信息的重要性。为此,我们提出基于U形架构的多轴外部权重网络(MEW-UNet),用多轴外部权重模块替换ViT中的自注意力机制。具体而言,该模块对输入特征的三轴进行傅里叶变换,并在频域中分配由外部权重生成器生成的外部权重,随后通过逆傅里叶变换将特征恢复至空间域。我们在Synapse、ACDC、ISIC17和ISIC18四个数据集上评估了模型,由于有效利用了频域信息,该方法展现出具有竞争力的性能。