The kernel thinning algorithm of Dwivedi & Mackey (2024) provides a better-than-i.i.d. compression of a generic set of points. By generating high-fidelity coresets of size significantly smaller than the input points, KT is known to speed up unsupervised tasks like Monte Carlo integration, uncertainty quantification, and non-parametric hypothesis testing, with minimal loss in statistical accuracy. In this work, we generalize the KT algorithm to speed up supervised learning problems involving kernel methods. Specifically, we combine two classical algorithms--Nadaraya-Watson (NW) regression or kernel smoothing, and kernel ridge regression (KRR)--with KT to provide a quadratic speed-up in both training and inference times. We show how distribution compression with KT in each setting reduces to constructing an appropriate kernel, and introduce the Kernel-Thinned NW and Kernel-Thinned KRR estimators. We prove that KT-based regression estimators enjoy significantly superior computational efficiency over the full-data estimators and improved statistical efficiency over i.i.d. subsampling of the training data. En route, we also provide a novel multiplicative error guarantee for compressing with KT. We validate our design choices with both simulations and real data experiments.
翻译:Dwivedi & Mackey (2024) 提出的核稀疏化算法为一般点集提供了优于独立同分布的压缩效果。通过生成规模显著小于输入点的高保真核心集,已知KT能够以最小的统计精度损失,加速蒙特卡洛积分、不确定性量化和非参数假设检验等无监督任务。在本工作中,我们将KT算法推广至加速涉及核方法的监督学习问题。具体而言,我们将两种经典算法——Nadaraya-Watson回归(或核平滑)与核岭回归——与KT相结合,在训练和推理时间上实现了二次加速。我们展示了在每种设置下利用KT进行分布压缩如何归结为构造一个适当的核,并引入了核稀疏化NW估计量和核稀疏化KRR估计量。我们证明,基于KT的回归估计器相比全数据估计器具有显著优越的计算效率,并且相比训练数据的独立同分布子采样具有改进的统计效率。在此过程中,我们还为使用KT进行压缩提供了一个新颖的乘法误差保证。我们通过仿真和真实数据实验验证了我们的设计选择。