Kolmogorov-Arnold Network (KAN) is a network structure recently proposed by Liu et al. (2024) that offers improved interpretability and a more parsimonious design in many science-oriented tasks compared to multi-layer perceptrons. This work provides a rigorous theoretical analysis of KAN by establishing generalization bounds for KAN equipped with activation functions that are either represented by linear combinations of basis functions or lying in a low-rank Reproducing Kernel Hilbert Space (RKHS). In the first case, the generalization bound accommodates various choices of basis functions in forming the activation functions in each layer of KAN and is adapted to different operator norms at each layer. For a particular choice of operator norms, the bound scales with the $l_1$ norm of the coefficient matrices and the Lipschitz constants for the activation functions, and it has no dependence on combinatorial parameters (e.g., number of nodes) outside of logarithmic factors. Moreover, our result does not require the boundedness assumption on the loss function and, hence, is applicable to a general class of regression-type loss functions. In the low-rank case, the generalization bound scales polynomially with the underlying ranks as well as the Lipschitz constants of the activation functions in each layer. These bounds are empirically investigated for KANs trained with stochastic gradient descent on simulated and real data sets. The numerical results demonstrate the practical relevance of these bounds.
翻译:Kolmogorov-Arnold 网络(KAN)是 Liu 等人(2024)近期提出的一种网络结构,与多层感知机相比,其在许多面向科学计算的任务中展现出更强的可解释性与更简洁的设计。本文通过对 KAN 进行严格的理论分析,为其建立了泛化界,其中 KAN 的激活函数由基函数的线性组合表示或位于低秩再生核希尔伯特空间(RKHS)中。在第一种情形下,所建立的泛化界适用于 KAN 各层激活函数中基函数的不同选择,并能适应各层不同的算子范数。对于特定的算子范数选择,该界随系数矩阵的 $l_1$ 范数及各层激活函数的 Lipschitz 常数变化,且除对数因子外不依赖于组合参数(例如节点数量)。此外,我们的结果不要求损失函数的有界性假设,因此适用于一大类回归型损失函数。在低秩情形下,泛化界随各层激活函数的底层秩及 Lipschitz 常数呈多项式尺度增长。这些界通过在模拟和真实数据集上使用随机梯度下降训练的 KAN 进行了实证研究。数值结果验证了这些界的实际相关性。