In the statistical literature, as well as in artificial intelligence and machine learning, measures of discrepancy between two probability distributions are largely used to develop measures of goodness-of-fit. We concentrate on quadratic distances, which depend on a non-negative definite kernel. We propose a unified framework for the study of two-sample and k-sample goodness of fit tests based on the concept of matrix distance. We provide a succinct review of the goodness of fit literature related to the use of distance measures, and specifically to quadratic distances. We show that the quadratic distance kernel-based two-sample test has the same functional form with the maximum mean discrepancy test. We develop tests for the $k$-sample scenario, where the two-sample problem is a special case. We derive their asymptotic distribution under the null hypothesis and discuss computational aspects of the test procedures. We assess their performance, in terms of level and power, via extensive simulations and a real data example. The proposed framework is implemented in the QuadratiK package, available in both R and Python environments.
翻译:在统计学文献以及人工智能与机器学习领域,两个概率分布间的差异度量被广泛用于构建拟合优度度量。我们聚焦于依赖于非负定核的二次距离。基于矩阵距离的概念,我们提出了一个用于研究双样本与多样本拟合优度检验的统一框架。我们简要回顾了与距离度量使用相关的拟合优度文献,特别是二次距离。我们证明了基于核的二次距离双样本检验与最大均值差异检验具有相同的函数形式。我们针对多样本场景开发了检验方法,其中双样本问题是其特例。我们推导了其在零假设下的渐近分布,并讨论了检验程序的计算方面。通过大量模拟实验和一个真实数据案例,我们评估了其在检验水平和功效方面的性能。所提出的框架已在QuadraticK软件包中实现,该软件包可在R和Python环境中使用。