Goodness-of-fit testing is often criticized for its lack of practical relevance: since ``all models are wrong'', the null hypothesis that the data conform to our model is ultimately always rejected as the sample size grows. Despite this, probabilistic models are still used extensively, raising the more pertinent question of whether the model is \emph{good enough} for the task at hand. This question can be formalized as a robust goodness-of-fit testing problem by asking whether the data were generated from a distribution that is a mild perturbation of the model. In this paper, we show that existing kernel goodness-of-fit tests are not robust under common notions of robustness including both qualitative and quantitative robustness. We further show that robustification techniques using tilted kernels, while effective in the parameter estimation literature, are not sufficient to ensure both types of robustness in the testing setting. To address this, we propose the first robust kernel goodness-of-fit test, which resolves this open problem by using kernel Stein discrepancy (KSD) balls. This framework encompasses many well-known perturbation models, such as Huber's contamination and density-band models.
翻译:拟合优度检验常因其缺乏实际相关性而受到批评:既然“所有模型都是错误的”,随着样本量的增长,数据符合我们模型的零假设最终总是会被拒绝。尽管如此,概率模型仍被广泛使用,这引出了一个更切题的问题:模型对于当前任务是否“足够好”。通过询问数据是否来自模型的轻微扰动分布,可以将该问题形式化为鲁棒拟合优度检验问题。本文证明,现有的核拟合优度检验在包括定性与定量鲁棒性在内的常见鲁棒性概念下均不具备鲁棒性。我们进一步表明,虽然倾斜核的鲁棒化技术在参数估计文献中被证明有效,但在检验场景中不足以同时保证两类鲁棒性。为解决此问题,我们提出了首个鲁棒核拟合优度检验方法,通过采用核斯坦因差异(KSD)球框架解决了这一开放性问题。该框架涵盖了多种著名的扰动模型,例如胡贝尔污染模型和密度带模型。