Kernel methods underpin many of the most successful approaches in data science and statistics, and they allow representing probability measures as elements of a reproducing kernel Hilbert space without loss of information. Recently, the kernel Stein discrepancy (KSD), which combines Stein's method with the flexibility of kernel techniques, gained considerable attention. Through the Stein operator, KSD allows the construction of powerful goodness-of-fit tests where it is sufficient to know the target distribution up to a multiplicative constant. However, the typical U- and V-statistic-based KSD estimators suffer from a quadratic runtime complexity, which hinders their application in large-scale settings. In this work, we propose a Nystr\"om-based KSD acceleration -- with runtime $\mathcal O\left(mn+m^3\right)$ for $n$ samples and $m\ll n$ Nystr\"om points -- , show its $\sqrt{n}$-consistency with a classical sub-Gaussian assumption, and demonstrate its applicability for goodness-of-fit testing on a suite of benchmarks.
翻译:核方法是数据科学与统计学中许多最成功方法的基石,它们允许将概率测度表示为再生核希尔伯特空间中的元素而不损失信息。最近,将斯坦因方法与核技术的灵活性相结合的核斯坦因差异(KSD)获得了相当大的关注。通过斯坦因算子,KSD允许构建强大的拟合优度检验,其中仅需知道目标分布至一个乘法常数即可。然而,典型的基于U-统计量和V-统计量的KSD估计器具有二次运行时复杂度,这阻碍了其在大规模场景中的应用。在本工作中,我们提出了一种基于Nyström的KSD加速方法——对于$n$个样本和$m\ll n$个Nyström点,其运行时为$\mathcal O\left(mn+m^3\right)$——证明了其在经典次高斯假设下的$\sqrt{n}$一致性,并通过一系列基准测试展示了其在拟合优度检验中的适用性。