Modern compression methods can summarize a target distribution $\mathbb{P}$ more succinctly than i.i.d. sampling but require access to a low-bias input sequence like a Markov chain converging quickly to $\mathbb{P}$. We introduce a new suite of compression methods suitable for compression with biased input sequences. Given $n$ points targeting the wrong distribution and quadratic time, Stein Kernel Thinning (SKT) returns $\sqrt{n}$ equal-weighted points with $\widetilde{O}(n^{-1/2})$ maximum mean discrepancy (MMD) to $\mathbb {P}$. For larger-scale compression tasks, Low-rank SKT achieves the same feat in sub-quadratic time using an adaptive low-rank debiasing procedure that may be of independent interest. For downstream tasks that support simplex or constant-preserving weights, Stein Recombination and Stein Cholesky achieve even greater parsimony, matching the guarantees of SKT with as few as $\operatorname{poly-log}(n)$ weighted points. Underlying these advances are new guarantees for the quality of simplex-weighted coresets, the spectral decay of kernel matrices, and the covering numbers of Stein kernel Hilbert spaces. In our experiments, our techniques provide succinct and accurate posterior summaries while overcoming biases due to burn-in, approximate Markov chain Monte Carlo, and tempering.
翻译:现代压缩方法能够比独立同分布采样更简洁地总结目标分布 $\mathbb{P}$,但需要访问低偏差输入序列(例如快速收敛至 $\mathbb{P}$ 的马尔可夫链)。我们提出了一套适用于偏差输入序列压缩的新方法。给定 $n$ 个针对错误分布的点及二次时间复杂度,Stein 核细化(Stein Kernel Thinning, SKT)能够返回 $\sqrt{n}$ 个等权重点,其最大均值差异(MMD)相对于 $\mathbb{P}$ 达到 $\widetilde{O}(n^{-1/2})$。对于更大规模的压缩任务,低秩 SKT 通过自适应低秩去偏过程(可能具有独立研究价值)在次二次时间复杂度下实现相同效果。对于支持单纯形或常数保持权重的下游任务,Stein 重组与 Stein Cholesky 可实现更大程度的精简,仅需 $\operatorname{poly-log}(n)$ 个加权点即可匹配 SKT 的保证。这些进展的背后是单纯形加权核心集质量、核矩阵谱衰减以及 Stein 核希尔伯特空间覆盖数的新保证。在实验中,我们的方法在克服由预烧期、近似马尔可夫链蒙特卡洛及回火引起的偏差的同时,提供了简洁而准确的后验总结。