Kernel methods underpin many of the most successful approaches in data science and statistics, and they allow representing probability measures as elements of a reproducing kernel Hilbert space without loss of information. Recently, the kernel Stein discrepancy (KSD), which combines Stein's method with kernel techniques, gained considerable attention. Through the Stein operator, KSD allows the construction of powerful goodness-of-fit tests where it is sufficient to know the target distribution up to a multiplicative constant. However, the typical U- and V-statistic-based KSD estimators suffer from a quadratic runtime complexity, which hinders their application in large-scale settings. In this work, we propose a Nystr\"om-based KSD acceleration -- with runtime $\mathcal O\!\left(mn+m^3\right)$ for $n$ samples and $m\ll n$ Nystr\"om points -- , show its $\sqrt{n}$-consistency under the null with a classical sub-Gaussian assumption, and demonstrate its applicability for goodness-of-fit testing on a suite of benchmarks.
翻译:核方法是数据科学与统计学中许多最成功方法的基石,它们允许将概率测度表示为再生核希尔伯特空间中的元素而不损失信息。最近,将Stein方法与核技术相结合的核Stein差异(KSD)获得了广泛关注。通过Stein算子,KSD能够构建强大的拟合优度检验,其中仅需知道目标分布到一个乘法常数的程度即可。然而,典型的基于U-统计量和V-统计量的KSD估计器具有二次运行时复杂度,这阻碍了其在大规模场景中的应用。在本工作中,我们提出了一种基于Nyström方法的KSD加速方案——对于n个样本和m≪n个Nyström点,其运行时复杂度为$\mathcal O\!\left(mn+m^3\right)$——在经典次高斯假设下证明了其在零假设下的$\sqrt{n}$相合性,并通过一系列基准测试验证了其在拟合优度检验中的适用性。