The need for scalable and expressive models in machine learning is paramount, particularly in applications requiring both structural depth and flexibility. Traditional deep learning methods, such as multilayer perceptrons (MLP), offer depth but lack ability to integrate structural characteristics of deep learning architectures with non-parametric flexibility of kernel methods. To address this, deep kernel learning (DKL) was introduced, where inputs to a base kernel are transformed using a deep learning architecture. These kernels can replace standard kernels, allowing both expressive power and scalability. The advent of Kolmogorov-Arnold Networks (KAN) has generated considerable attention and discussion among researchers in scientific domain. In this paper, we introduce a scalable deep kernel using KAN (DKL-KAN) as an effective alternative to DKL using MLP (DKL-MLP). Our approach involves simultaneously optimizing these kernel attributes using marginal likelihood within a Gaussian process framework. We analyze two variants of DKL-KAN for a fair comparison with DKL-MLP: one with same number of neurons and layers as DKL-MLP, and another with approximately same number of trainable parameters. To handle large datasets, we use kernel interpolation for scalable structured Gaussian processes (KISS-GP) for low-dimensional inputs and KISS-GP with product kernels for high-dimensional inputs. The efficacy of DKL-KAN is evaluated in terms of computational training time and test prediction accuracy across a wide range of applications. Additionally, the effectiveness of DKL-KAN is also examined in modeling discontinuities and accurately estimating prediction uncertainty. The results indicate that DKL-KAN outperforms DKL-MLP on datasets with a low number of observations. Conversely, DKL-MLP exhibits better scalability and higher test prediction accuracy on datasets with large number of observations.
翻译:机器学习领域对可扩展且表达能力强的模型需求日益迫切,尤其是在需要结构深度与灵活性的应用中。传统的深度学习方法(如多层感知机MLP)虽具备深度优势,但难以将深度学习架构的结构特性与核方法的非参数灵活性相结合。为此,深度核学习(DKL)被提出,其通过深度学习架构对基础核的输入进行变换。这类核可替代标准核,兼具表达力与可扩展性。Kolmogorov-Arnold网络(KAN)的出现引发了科学领域研究者的广泛关注与讨论。本文提出一种基于KAN的可扩展深度核方法(DKL-KAN),作为基于MLP的DKL(DKL-MLP)的有效替代方案。我们采用高斯过程框架下的边缘似然函数同步优化这些核属性。为公平比较DKL-MLP,我们分析了两种DKL-KAN变体:一种采用与DKL-MLP相同的神经元数和层数,另一种保持可训练参数量大致相当。针对大规模数据集,对低维输入采用可扩展结构化高斯过程的核插值方法(KISS-GP),对高维输入则采用基于乘积核的KISS-GP。通过多领域应用场景,从计算训练时间和测试预测精度两方面评估DKL-KAN的有效性。此外,还检验了DKL-KAN在建模不连续性和精确估计预测不确定性方面的性能。结果表明,在观测数据量较少的数据集上,DKL-KAN优于DKL-MLP;反之,在观测数据量较大的数据集上,DKL-MLP展现出更好的可扩展性与更高的测试预测精度。