Modern compression methods can summarize a target distribution $\mathbb{P}$ more succinctly than i.i.d. sampling but require access to a low-bias input sequence like a Markov chain converging quickly to $\mathbb{P}$. We introduce a new suite of compression methods suitable for compression with biased input sequences. Given $n$ points targeting the wrong distribution and quadratic time, Stein kernel thinning (SKT) returns $\sqrt{n}$ equal-weighted points with $\widetilde{O}(n^{-1/2})$ maximum mean discrepancy (MMD) to $\mathbb{P}$. For larger-scale compression tasks, low-rank SKT achieves the same feat in sub-quadratic time using an adaptive low-rank debiasing procedure that may be of independent interest. For downstream tasks that support simplex or constant-preserving weights, Stein recombination and Stein Cholesky achieve even greater parsimony, matching the guarantees of SKT with as few as $\text{poly-log}(n)$ weighted points. Underlying these advances are new guarantees for the quality of simplex-weighted coresets, the spectral decay of kernel matrices, and the covering numbers of Stein kernel Hilbert spaces. In our experiments, our techniques provide succinct and accurate posterior summaries while overcoming biases due to burn-in, approximate Markov chain Monte Carlo, and tempering.
翻译:现代压缩方法能够比独立同分布采样更简洁地总结目标分布 $\mathbb{P}$,但需要访问一个低偏差的输入序列,例如能快速收敛到 $\mathbb{P}$ 的马尔可夫链。我们提出了一套适用于有偏输入序列压缩的新方法。给定 $n$ 个以错误分布为目标的点以及二次时间,Stein 核稀释(SKT)可返回 $\sqrt{n}$ 个等权重点,其与 $\mathbb{P}$ 的最大均值差异(MMD)为 $\widetilde{O}(n^{-1/2})$。对于更大规模的压缩任务,低秩 SKT 利用一种可能具有独立研究价值的自适应低秩去偏过程,在次二次时间内实现相同的效果。对于支持单纯形权重或常数保持权重的下游任务,Stein 重组和 Stein Cholesky 方法实现了更高的简洁性,仅用 $\text{多对数}(n)$ 个加权点即可匹配 SKT 的保证。这些进展的基础是关于单纯形加权核心集质量、核矩阵谱衰减以及 Stein 核希尔伯特空间覆盖数的新理论保证。在我们的实验中,我们的技术提供了简洁而准确的后验摘要,同时克服了由于老化阶段、近似马尔可夫链蒙特卡洛以及回火技术带来的偏差。