The present paper introduces concurrency-driven enhancements to the training algorithm for the Kolmogorov-Arnold networks (KANs) that is based on the Newton-Kaczmarz (NK) method. As indicated by prior research, KANs trained using the NK-based approach significantly overtake classical neural networks based on multilayer perceptrons (MLPs) in terms of accuracy and training time. Although certain parts of the algorithm, such as evaluation of the basis functions, can be parallelised, a fundamental limitation lies in the computation of the updates' values that is sequential: each update depends on the results of the previous step, obstructing parallel application. However, substantial acceleration is achievable. Three complementary strategies are proposed in the present paper: (i) a pre-training procedure tailored to the NK updates' structure, (ii) training on disjoint subsets of data, followed by models' merging, not in the context of federated learning, but as a mechanism for accelerating the convergence, and (iii) a parallelisation technique suitable for execution on field-programmable gate arrays (FPGAs), which is implemented and tested directly on the device. All experimental results presented in this work are fully reproducible, with the complete source codes available online.
翻译:本文针对基于Newton-Kaczmarz(NK)方法的Kolmogorov-Arnold网络(KANs)训练算法,提出了并发驱动的优化方案。已有研究表明,采用NK方法训练的KANs在精度和训练时间上均显著超越基于多层感知机(MLP)的传统神经网络。虽然算法中部分环节(如基函数求值)可并行化,但其核心瓶颈在于更新值的计算具有顺序依赖性:每一步更新均需前一步结果,这阻碍了并行执行。然而,显著的加速仍可实现。本文提出三种互补策略:(一)针对NK更新结构设计的预训练流程;(二)在不相交数据子集上分别训练后进行模型融合——此非联邦学习框架,而是作为加速收敛的机制;(三)适用于现场可编程门阵列(FPGA)的并行化技术,该技术已在设备端直接实现并测试。本研究所有实验结果均具备完全可复现性,完整源代码已在线公开。