Kolmogorov-Arnold Networks have recently been introduced as a flexible alternative to multi-layer Perceptron architectures. In this paper, we examine the training dynamics of different KAN architectures and compare them with corresponding MLP formulations. We train with a variety of different initialization schemes, optimizers, and learning rates, as well as utilize back propagation free approaches like the HSIC Bottleneck. We find that (when judged by test accuracy) KANs are an effective alternative to MLP architectures on high-dimensional datasets and have somewhat better parameter efficiency, but suffer from more unstable training dynamics. Finally, we provide recommendations for improving training stability of larger KAN models.
翻译:Kolmogorov-Arnold网络最近被提出作为多层感知机架构的一种灵活替代方案。本文研究了不同KAN架构的训练动态,并将其与相应的MLP架构进行了比较。我们采用了多种不同的初始化方案、优化器和学习率进行训练,并利用了如HSIC瓶颈等免反向传播的方法。研究发现(以测试准确率为评判标准),在高维数据集上,KAN是MLP架构的一种有效替代方案,且具有稍好的参数效率,但其训练动态更不稳定。最后,我们提出了改进更大规模KAN模型训练稳定性的建议。