Kernel-based methods are heavily used in machine learning. However, they suffer from $O(N^2)$ complexity in the number $N$ of considered data points. In this paper, we propose an approximation procedure, which reduces this complexity to $O(N)$. Our approach is based on two ideas. First, we prove that any radial kernel with analytic basis function can be represented as sliced version of some one-dimensional kernel and derive an analytic formula for the one-dimensional counterpart. It turns out that the relation between one- and $d$-dimensional kernels is given by a generalized Riemann-Liouville fractional integral. Hence, we can reduce the $d$-dimensional kernel summation to a one-dimensional setting. Second, for solving these one-dimensional problems efficiently, we apply fast Fourier summations on non-equispaced data, a sorting algorithm or a combination of both. Due to its practical importance we pay special attention to the Gaussian kernel, where we show a dimension-independent error bound and represent its one-dimensional counterpart via a closed-form Fourier transform. We provide a run time comparison and error estimate of our fast kernel summations.
翻译:基于核的方法在机器学习中被广泛使用。然而,它们面临计算复杂度随数据点数目 $N$ 呈 $O(N^2)$ 增长的问题。本文提出一种近似计算方案,可将该复杂度降低至 $O(N)$。我们的方法基于两个核心思想。首先,我们证明任何具有解析基函数的径向核均可表示为某个一维核的切片形式,并推导了一维对应核的解析表达式。结果表明,一维核与 $d$ 维核之间的关系可由广义的 Riemann-Liouville 分数次积分给出。因此,我们可以将 $d$ 维核求和问题约化到一维情形。其次,为高效求解这些一维问题,我们采用了非均匀数据上的快速傅里叶求和、排序算法或两者的结合。鉴于高斯核在实际应用中的重要性,我们对其进行了特别关注,给出了与维度无关的误差界,并通过闭式傅里叶变换表示其一维对应核。我们提供了快速核求和方法的运行时间比较与误差估计。