Deeply stacked KANs are practically impossible due to high training difficulties and substantial memory requirements. Consequently, existing studies can only incorporate few KAN layers, hindering the comprehensive exploration of KANs. This study overcomes these limitations and introduces the first fully KA-based deep model, demonstrating that KA-based layers can entirely replace traditional architectures in deep learning and achieve superior learning capacity. Specifically, (1) the proposed Share-activation KAN (SaKAN) reformulates Sprecher's variant of Kolmogorov-Arnold representation theorem, which achieves better optimization due to its simplified parameterization and denser training samples, to ease training difficulty, (2) this paper indicates that spline gradients contribute negligibly to training while consuming huge GPU memory, thus proposes the Grad-Free Spline to significantly reduce memory usage and computational overhead. (3) Building on these two innovations, our ALL U-KAN is the first representative implementation of fully KA-based deep model, where the proposed KA and KAonv layers completely replace FC and Conv layers. Extensive evaluations on three medical image segmentation tasks confirm the superiority of the full KA-based architecture compared to partial KA-based and traditional architectures, achieving all higher segmentation accuracy. Compared to directly deeply stacked KAN, ALL U-KAN achieves 10 times reduction in parameter count and reduces memory consumption by more than 20 times, unlocking the new explorations into deep KAN architectures.
翻译:由于训练难度极高且内存需求巨大,深度堆叠的KAN在实践中几乎无法实现。因此,现有研究只能纳入少量KAN层,阻碍了对KAN的全面探索。本研究克服了这些限制,首次引入了完全基于KA的深度模型,证明了基于KA的层可以完全替代深度学习中的传统架构,并实现更优的学习能力。具体而言:(1)提出的共享激活KAN(SaKAN)重新表述了Sprecher版本的Kolmogorov-Arnold表示定理,该定理因其简化的参数化和更密集的训练样本而实现了更好的优化,从而缓解了训练难度;(2)本文指出样条梯度对训练的贡献微乎其微,同时消耗大量GPU内存,因此提出了无梯度样条,显著降低了内存使用和计算开销。(3)基于这两项创新,我们的ALL U-KAN是首个完全基于KA的深度模型的代表性实现,其中提出的KA和KAonv层完全替代了全连接层和卷积层。在三个医学图像分割任务上的广泛评估证实了完全基于KA的架构相较于部分基于KA的架构和传统架构的优越性,实现了更高的分割精度。与直接深度堆叠的KAN相比,ALL U-KAN实现了参数数量减少10倍,内存消耗降低超过20倍,为深度KAN架构的探索开辟了新的道路。