Models based on U-like structures have improved the performance of medical image segmentation. However, the single-layer decoder structure of U-Net is too "thin" to exploit enough information, resulting in large semantic differences between the encoder and decoder parts. Things get worse if the number of training sets of data is not sufficiently large, which is common in medical image processing tasks where annotated data are more difficult to obtain than other tasks. Based on this observation, we propose a novel U-Net model named MS-UNet for the medical image segmentation task in this study. Instead of the single-layer U-Net decoder structure used in Swin-UNet and TransUnet, we specifically design a multi-scale nested decoder based on the Swin Transformer for U-Net. The proposed multi-scale nested decoder structure allows the feature mapping between the decoder and encoder to be semantically closer, thus enabling the network to learn more detailed features. In addition, we propose a novel edge loss and a plug-and-play fine-tuning Denoising module, which not only effectively improves the segmentation performance of MS-UNet, but could also be applied to other models individually. Experimental results show that MS-UNet could effectively improve the network performance with more efficient feature learning capability and exhibit more advanced performance, especially in the extreme case with a small amount of training data, and the proposed Edge loss and Denoising module could significantly enhance the segmentation performance of MS-UNet.
翻译:基于U形结构的模型已提升了医学图像分割的性能。然而,U-Net的单层解码器结构过于"单薄",难以充分利用足够的信息,导致编码器与解码器部分之间存在显著的语义差异。在训练数据集规模不够大时,这一问题会进一步恶化——这在医学图像处理任务中很常见,因为此类任务的标注数据比其他任务更难获取。基于这一观察,本研究提出了一种名为MS-UNet的新型U-Net模型用于医学图像分割任务。与Swin-UNet和TransUnet中采用的单层U-Net解码器结构不同,我们专门为U-Net设计了一种基于Swin Transformer的多尺度嵌套解码器。所提出的多尺度嵌套解码器结构能使解码器与编码器之间的特征映射在语义上更接近,从而使网络能够学习更细节的特征。此外,我们还提出了一种新的边缘损失函数以及即插即用的微调去噪模块,这不仅有效提升了MS-UNet的分割性能,还可单独应用于其他模型。实验结果表明,MS-UNet能以更高效的特征学习能力有效提升网络性能,尤其在训练数据量极少的极端情况下展现出更先进的性能,而所提出的边缘损失函数与去噪模块能显著增强MS-UNet的分割表现。