We apply the discrepancy method and a chaining approach to give improved bounds on the coreset complexity of a wide class of kernel functions. Our results give randomized polynomial time algorithms to produce coresets of size $O\big(\frac{\sqrt{d}}{\varepsilon}\sqrt{\log\log \frac{1}{\varepsilon}}\big)$ for the Gaussian and Laplacian kernels in the case that the data set is uniformly bounded, an improvement that was not possible with previous techniques. We also obtain coresets of size $O\big(\frac{1}{\varepsilon}\sqrt{\log\log \frac{1}{\varepsilon}}\big)$ for the Laplacian kernel for $d$ constant. Finally, we give the best known bounds of $O\big(\frac{\sqrt{d}}{\varepsilon}\sqrt{\log(2\max\{1,\alpha\})}\big)$ on the coreset complexity of the exponential, Hellinger, and JS Kernels, where $1/\alpha$ is the bandwidth parameter of the kernel.
翻译:我们应用差异方法与链式方法,为广泛类别核函数的核集复杂度给出了改进的边界。在数据集一致有界的情形下,我们的结果为高斯核和拉普拉斯核提供了随机多项式时间算法,可生成大小为$O\big(\frac{\sqrt{d}}{\varepsilon}\sqrt{\log\log \frac{1}{\varepsilon}}\big)$的核集,这一改进是先前技术无法实现的。对于常数$d$的拉普拉斯核,我们还获得了大小为$O\big(\frac{1}{\varepsilon}\sqrt{\log\log \frac{1}{\varepsilon}}\big)$的核集。最后,我们给出了指数核、Hellinger核与JS核的核集复杂度已知最佳边界$O\big(\frac{\sqrt{d}}{\varepsilon}\sqrt{\log(2\max\{1,\alpha\})}\big)$,其中$1/\alpha$为核的带宽参数。