In the realm of deep learning, the Kolmogorov-Arnold Network (KAN) has emerged as a potential alternative to multilayer projections (MLPs). However, its applicability to vision tasks has not been extensively validated. In our study, we demonstrated the effectiveness of KAN for vision tasks through multiple trials on the MNIST, CIFAR10, and CIFAR100 datasets, using a training batch size of 32. Our results showed that while KAN outperformed the original MLP-Mixer on CIFAR10 and CIFAR100, it performed slightly worse than the state-of-the-art ResNet-18. These findings suggest that KAN holds significant promise for vision tasks, and further modifications could enhance its performance in future evaluations.Our contributions are threefold: first, we showcase the efficiency of KAN-based algorithms for visual tasks; second, we provide extensive empirical assessments across various vision benchmarks, comparing KAN's performance with MLP-Mixer, CNNs, and Vision Transformers (ViT); and third, we pioneer the use of natural KAN layers in visual tasks, addressing a gap in previous research. This paper lays the foundation for future studies on KANs, highlighting their potential as a reliable alternative for image classification tasks.
翻译:在深度学习领域,Kolmogorov-Arnold网络(KAN)已成为多层感知机(MLP)的一种潜在替代方案。然而,其在视觉任务中的适用性尚未得到广泛验证。在本研究中,我们通过在MNIST、CIFAR10和CIFAR100数据集上进行多次实验(训练批次大小为32),验证了KAN在视觉任务中的有效性。实验结果表明,KAN在CIFAR10和CIFAR100上的表现优于原始MLP-Mixer,但略逊于当前最先进的ResNet-18。这些发现表明KAN在视觉任务中具有重要潜力,进一步的改进有望在未来评估中提升其性能。本文的贡献主要体现在三个方面:首先,我们展示了基于KAN的算法在视觉任务中的高效性;其次,我们在多种视觉基准测试中进行了广泛的实证评估,将KAN的性能与MLP-Mixer、CNN以及Vision Transformer(ViT)进行了比较;最后,我们率先在视觉任务中应用自然KAN层,填补了先前研究的空白。本文为未来KAN研究奠定了基础,凸显了其作为图像分类任务可靠替代方案的潜力。