Medical image segmentation is critical for accurate diagnostics and treatment planning, but remains challenging due to complex anatomical structures and limited annotated training data. CNN-based segmentation methods excel at local feature extraction, but struggle with modeling long-range dependencies. Transformers, on the other hand, capture global context more effectively, but are inherently data-hungry and computationally expensive. In this work, we introduce UKAST, a U-Net like architecture that integrates rational-function based Kolmogorov-Arnold Networks (KANs) into Swin Transformer encoders. By leveraging rational base functions and Group Rational KANs (GR-KANs) from the Kolmogorov-Arnold Transformer (KAT), our architecture addresses the inefficiencies of vanilla spline-based KANs, yielding a more expressive and data-efficient framework with reduced FLOPs and only a very small increase in parameter count compared to SwinUNETR. UKAST achieves state-of-the-art performance on four diverse 2D and 3D medical image segmentation benchmarks, consistently surpassing both CNN- and Transformer-based baselines. Notably, it attains superior accuracy in data-scarce settings, alleviating the data-hungry limitations of standard Vision Transformers. These results show the potential of KAN-enhanced Transformers to advance data-efficient medical image segmentation. Code is available at: https://github.com/nsapkota417/UKAST
翻译:医学图像分割对于精确诊断和治疗规划至关重要,但由于复杂的解剖结构和有限的标注训练数据,该任务仍具挑战性。基于CNN的分割方法擅长局部特征提取,但在建模长程依赖关系方面存在困难。另一方面,Transformer能更有效地捕获全局上下文,但本质上对数据需求量大且计算成本高昂。本文中,我们提出了UKAST,一种类似U-Net的架构,它将基于有理函数的Kolmogorov-Arnold Networks(KANs)集成到Swin Transformer编码器中。通过利用来自Kolmogorov-Arnold Transformer(KAT)的有理基函数和Group Rational KANs(GR-KANs),我们的架构解决了基于原始样条的KANs的低效问题,从而构建了一个更具表达力且数据效率更高的框架,与SwinUNETR相比,其FLOPs减少,参数量仅轻微增加。UKAST在四个不同的2D和3D医学图像分割基准测试中取得了最先进的性能,持续超越了基于CNN和Transformer的基线方法。值得注意的是,它在数据稀缺场景下实现了更高的精度,缓解了标准Vision Transformers对数据需求大的限制。这些结果表明,KAN增强的Transformer在推进数据高效的医学图像分割方面具有潜力。代码发布于:https://github.com/nsapkota417/UKAST