Kernel-based methods are heavily used in machine learning. However, they suffer from $O(N^2)$ complexity in the number $N$ of considered data points. In this paper, we propose an approximation procedure, which reduces this complexity to $O(N)$. Our approach is based on two ideas. First, we prove that any radial kernel with analytic basis function can be represented as sliced version of some one-dimensional kernel and derive an analytic formula for the one-dimensional counterpart. It turns out that the relation between one- and $d$-dimensional kernels is given by a generalized Riemann-Liouville fractional integral. Hence, we can reduce the $d$-dimensional kernel summation to a one-dimensional setting. Second, for solving these one-dimensional problems efficiently, we apply fast Fourier summations on non-equispaced data, a sorting algorithm or a combination of both. Due to its practical importance we pay special attention to the Gaussian kernel, where we show a dimension-independent error bound and represent its one-dimensional counterpart via a closed-form Fourier transform. We provide a run time comparison and error estimate of our fast kernel summations.
翻译:核方法在机器学习中被广泛使用。然而,其计算复杂度随数据点数量 $N$ 呈 $O(N^2)$ 增长。本文提出一种近似计算方案,可将复杂度降低至 $O(N)$。我们的方法基于两个核心思想。首先,我们证明任何具有解析基函数的径向核均可表示为某个一维核的切片形式,并推导了一维对应核的解析表达式。结果表明,一维与 $d$ 维核之间的关系可由广义 Riemann-Liouville 分数次积分给出。因此,我们可以将 $d$ 维核求和问题简化至一维场景。其次,为高效求解这些一维问题,我们采用了非均匀数据上的快速傅里叶求和、排序算法或两者的结合。鉴于高斯核的实际重要性,我们特别关注该核函数,展示了其与维度无关的误差界,并通过闭式傅里叶变换给出了其一维对应核的表达式。我们提供了快速核求和算法的运行时间比较与误差估计。