Deep learning has long been dominated by multi-layer perceptrons (MLPs), which have demonstrated superiority over other optimizable models in various domains. Recently, a new alternative to MLPs has emerged - Kolmogorov-Arnold Networks (KAN)- which are based on a fundamentally different mathematical framework. According to their authors, KANs address several major issues in MLPs, such as catastrophic forgetting in continual learning scenarios. However, this claim has only been supported by results from a regression task on a toy 1D dataset. In this paper, we extend the investigation by evaluating the performance of KANs in continual learning tasks within computer vision, specifically using the MNIST datasets. To this end, we conduct a structured analysis of the behavior of MLPs and two KAN-based models in a class-incremental learning scenario, ensuring that the architectures involved have the same number of trainable parameters. Our results demonstrate that an efficient version of KAN outperforms both traditional MLPs and the original KAN implementation. We further analyze the influence of hyperparameters in MLPs and KANs, as well as the impact of certain trainable parameters in KANs, such as bias and scale weights. Additionally, we provide a preliminary investigation of recent KAN-based convolutional networks and compare their performance with that of traditional convolutional neural networks. Our codes can be found at https://github.com/MrPio/KAN-Continual_Learning_tests.
翻译:长期以来,深度学习一直由多层感知机(MLP)主导,其在多个领域已展现出优于其他可优化模型的性能。最近,一种新的MLP替代方案——Kolmogorov-Arnold Networks(KAN)——被提出,其基于根本不同的数学框架。据其作者称,KAN解决了MLP中的若干主要问题,例如持续学习场景中的灾难性遗忘。然而,这一主张目前仅得到在一维玩具数据集回归任务上的结果支持。本文通过评估KAN在计算机视觉持续学习任务中的性能(具体使用MNIST数据集)来扩展这一研究。为此,我们在类增量学习场景下对MLP和两种基于KAN的模型进行了结构化行为分析,并确保所涉及的架构具有相同数量的可训练参数。我们的结果表明,一种高效版本的KAN在性能上优于传统MLP和原始KAN实现。我们进一步分析了MLP和KAN中超参数的影响,以及KAN中某些可训练参数(如偏置和缩放权重)的作用。此外,我们还对近期基于KAN的卷积网络进行了初步研究,并将其性能与传统卷积神经网络进行了比较。我们的代码可在 https://github.com/MrPio/KAN-Continual_Learning_tests 找到。