We introduce kernel thinning, a new procedure for compressing a distribution $\mathbb{P}$ more effectively than i.i.d. sampling or standard thinning. Given a suitable reproducing kernel $\mathbf{k}_{\star}$ and $\mathcal{O}(n^2)$ time, kernel thinning compresses an $n$-point approximation to $\mathbb{P}$ into a $\sqrt{n}$-point approximation with comparable worst-case integration error across the associated reproducing kernel Hilbert space. The maximum discrepancy in integration error is $\mathcal{O}_d(n^{-1/2}\sqrt{\log n})$ in probability for compactly supported $\mathbb{P}$ and $\mathcal{O}_d(n^{-\frac{1}{2}} (\log n)^{(d+1)/2}\sqrt{\log\log n})$ for sub-exponential $\mathbb{P}$ on $\mathbb{R}^d$. In contrast, an equal-sized i.i.d. sample from $\mathbb{P}$ suffers $\Omega(n^{-1/4})$ integration error. Our sub-exponential guarantees resemble the classical quasi-Monte Carlo error rates for uniform $\mathbb{P}$ on $[0,1]^d$ but apply to general distributions on $\mathbb{R}^d$ and a wide range of common kernels. Moreover, the same construction delivers near-optimal $L^\infty$ coresets in $\mathcal O(n^2)$ time. We use our results to derive explicit non-asymptotic maximum mean discrepancy bounds for Gaussian, Mat\'ern, and B-spline kernels and present two vignettes illustrating the practical benefits of kernel thinning over i.i.d. sampling and standard Markov chain Monte Carlo thinning, in dimensions $d=2$ through $100$.
翻译:我们提出核密度压缩(kernel thinning),这是一种通过比独立同分布采样或标准压缩更有效的方式对分布$\mathbb{P}$进行压缩的新方法。给定合适的再生核$\mathbf{k}_{\star}$以及$\mathcal{O}(n^2)$时间,核密度压缩将一个$n$点近似分布$\mathbb{P}$压缩为$\sqrt{n}$点近似分布,且在关联再生核希尔伯特空间上具有可比较的最坏情形积分误差。对于紧支撑分布$\mathbb{P}$,积分误差的最大差异依概率为$\mathcal{O}_d(n^{-1/2}\sqrt{\log n})$;对于$\mathbb{R}^d$上的次指数分布$\mathbb{P}$,该差异为$\mathcal{O}_d(n^{-\frac{1}{2}} (\log n)^{(d+1)/2}\sqrt{\log\log n})$。相比之下,从$\mathbb{P}$中抽取同等大小的独立同分布样本会产生$\Omega(n^{-1/4})$的积分误差。我们的次指数保证类似于$[0,1]^d$上均匀分布$\mathbb{P}$的经典拟蒙特卡洛误差率,但适用于$\mathbb{R}^d$上的任意分布及广泛常见核函数。此外,该构造可在$\mathcal{O}(n^2)$时间内递送近最优的$L^\infty$核心集。我们利用这些结果推导了高斯核、Matérn核和B样条核的显式非渐近最大均值差异界,并通过两个示例展示了在$d=2$至$100$维空间上,核密度压缩相较于独立同分布采样和标准马尔可夫链蒙特卡洛压缩的实际优势。