Kernel methods underpin many of the most successful approaches in data science and statistics, and they allow representing probability measures as elements of a reproducing kernel Hilbert space without loss of information. Recently, the kernel Stein discrepancy (KSD), which combines Stein's method with kernel techniques, gained considerable attention. Through the Stein operator, KSD allows the construction of powerful goodness-of-fit tests where it is sufficient to know the target distribution up to a multiplicative constant. However, the typical U- and V-statistic-based KSD estimators suffer from a quadratic runtime complexity, which hinders their application in large-scale settings. In this work, we propose a Nystr\"om-based KSD acceleration -- with runtime $\mathcal O\!\left(mn+m^3\right)$ for $n$ samples and $m\ll n$ Nystr\"om points -- , show its $\sqrt{n}$-consistency under the null with a classical sub-Gaussian assumption, and demonstrate its applicability for goodness-of-fit testing on a suite of benchmarks.
翻译:核方法是数据科学与统计学中许多最成功方法的基础,它们允许将概率测度表示为再生核希尔伯特空间中的元素而不损失信息。最近,将Stein方法与核技术相结合的核Stein差异(KSD)获得了广泛关注。通过Stein算子,KSD能够构建强大的拟合优度检验,其中仅需知道目标分布至一个乘法常数即可。然而,典型的基于U-统计量和V-统计量的KSD估计器具有二次运行时复杂度,这阻碍了其在大规模场景中的应用。在本工作中,我们提出了一种基于Nyström方法的KSD加速方案——对于$n$个样本和$m\ll n$个Nyström点,其运行时为$\mathcal O\!\left(mn+m^3\right)$——在经典次高斯假设下证明了其在原假设下的$\sqrt{n}$相合性,并在一系列基准测试中展示了其用于拟合优度检验的适用性。