Kernel-based methods are heavily used in machine learning. However, they suffer from $O(N^2)$ complexity in the number $N$ of considered data points. In this paper, we propose an approximation procedure, which reduces this complexity to $O(N)$. Our approach is based on two ideas. First, we prove that any radial kernel with analytic basis function can be represented as sliced version of some one-dimensional kernel and derive an analytic formula for the one-dimensional counterpart. It turns out that the relation between one- and $d$-dimensional kernels is given by a generalized Riemann-Liouville fractional integral. Hence, we can reduce the $d$-dimensional kernel summation to a one-dimensional setting. Second, for solving these one-dimensional problems efficiently, we apply fast Fourier summations on non-equispaced data, a sorting algorithm or a combination of both. Due to its practical importance we pay special attention to the Gaussian kernel, where we show a dimension-independent error bound and represent its one-dimensional counterpart via a closed-form Fourier transform. We provide a run time comparison and error estimate of our fast kernel summations.
翻译:核方法在机器学习中有着广泛应用,但其计算复杂度随数据点数量$N$呈$O(N^2)$增长。本文提出一种近似计算方法,将复杂度降至$O(N)$。本方法基于两个核心思想:首先,我们证明任何具有解析基函数的径向核均可表示为某类一维核的切片形式,并推导出一维对应核的解析表达式。研究显示,一维与$d$维核函数之间的转换关系由广义Riemann-Liouville分数阶积分给出,从而可将$d$维核求和问题降维至一维框架处理。其次,为高效求解这些一维问题,我们分别采用非等距数据快速傅里叶求和、排序算法,或两者相结合的策略。鉴于高斯核的实际重要性,本文对其展开重点研究,建立了维度无关的误差界,并通过闭式傅里叶变换给出其一维对应核的表达形式。最后,我们通过运行时对比实验与误差估计验证了所提快速核求和方法的有效性。