We investigate properties of goodness-of-fit tests based on the Kernel Stein Discrepancy (KSD). We introduce a strategy to construct a test, called KSDAgg, which aggregates multiple tests with different kernels. KSDAgg avoids splitting the data to perform kernel selection (which leads to a loss in test power), and rather maximises the test power over a collection of kernels. We provide non-asymptotic guarantees on the power of KSDAgg: we show it achieves the smallest uniform separation rate of the collection, up to a logarithmic term. For compactly supported densities with bounded model score function, we derive the rate for KSDAgg over restricted Sobolev balls; this rate corresponds to the minimax optimal rate over unrestricted Sobolev balls, up to an iterated logarithmic term. KSDAgg can be computed exactly in practice as it relies either on a parametric bootstrap or on a wild bootstrap to estimate the quantiles and the level corrections. In particular, for the crucial choice of bandwidth of a fixed kernel, it avoids resorting to arbitrary heuristics (such as median or standard deviation) or to data splitting. We find on both synthetic and real-world data that KSDAgg outperforms other state-of-the-art quadratic-time adaptive KSD-based goodness-of-fit testing procedures.
翻译:我们研究了基于核斯坦因差异(KSD)的拟合优度检验的性质。提出了一种构建检验的策略,称为KSDAgg,该策略聚合了多个使用不同核的检验。KSDAgg无需分割数据以进行核选择(分割数据会导致检验功效下降),而是在一组核上最大化检验功效。我们为KSDAgg的功效提供了非渐近保证:证明其达到该集合中最小一致分离速率(至多相差一个对数项)。对于具有有界模型得分函数的紧支撑密度,我们在受限索伯列夫球上推导了KSDAgg的速率;该速率对应非受限索伯列夫球上的极小极大最优速率(至多相差一个迭代对数项)。KSDAgg在实践中可精确计算,因其依赖参数自助法或野生自助法来估计分位数和水平校正。特别地,对于固定核的带宽关键选择,它避免了依赖任意启发式方法(如中位数或标准差)或数据分割。我们在合成数据和真实数据上发现,KSDAgg优于其他基于KSD的自适应二次时间拟合优度检验方法。