In recent years, significant advancements have been made in deep learning for medical image segmentation, particularly with convolutional neural networks (CNNs) and transformer models. However, CNNs face limitations in capturing long-range dependencies, while transformers suffer from high computational complexity. To address this, we propose RWKV-UNet, a novel model that integrates the RWKV (Receptance Weighted Key Value) structure into the U-Net architecture. This integration enhances the model's ability to capture long-range dependencies and to improve contextual understanding, which is crucial for accurate medical image segmentation. We build a strong encoder with developed Global-Local Spatial Perception (GLSP) blocks combining CNNs and RWKVs. We also propose a Cross-Channel Mix (CCM) module to improve skip connections with multi-scale feature fusion, achieving global channel information integration. Experiments on 11 benchmark datasets show that the RWKV-UNet achieves state-of-the-art performance on various types of medical image segmentation tasks. Additionally, smaller variants, RWKV-UNet-S and RWKV-UNet-T, balance accuracy and computational efficiency, making them suitable for broader clinical applications.
翻译:近年来,深度学习在医学图像分割领域取得了显著进展,特别是卷积神经网络(CNNs)和Transformer模型的应用。然而,CNNs在捕获长程依赖关系方面存在局限性,而Transformer模型则面临计算复杂度高的问题。为解决这些问题,我们提出了RWKV-UNet,这是一种将RWKV(Receptance Weighted Key Value)结构集成到U-Net架构中的新型模型。这种集成增强了模型捕获长程依赖关系和提升上下文理解的能力,这对于精确的医学图像分割至关重要。我们构建了一个强大的编码器,其中采用了结合CNNs和RWKVs的全局-局部空间感知(GLSP)模块。我们还提出了跨通道混合(CCM)模块,通过多尺度特征融合改进跳跃连接,实现了全局通道信息的整合。在11个基准数据集上的实验表明,RWKV-UNet在多种类型的医学图像分割任务中达到了最先进的性能。此外,较小的变体RWKV-UNet-S和RWKV-UNet-T在准确性和计算效率之间取得了平衡,使其适用于更广泛的临床应用。