Two-sample hypothesis testing is a fundamental problem with various applications, which faces new challenges in the high-dimensional context. To mitigate the issue of the curse of dimensionality, high-dimensional data are typically assumed to lie on a low-dimensional manifold. To incorporate geometric informtion in the data, we propose to apply the Delaunay triangulation and develop the Delaunay weight to measure the geometric proximity among data points. In contrast to existing similarity measures that only utilize pairwise distances, the Delaunay weight can take both the distance and direction information into account. A detailed computation procedure to approximate the Delaunay weight for the unknown manifold is developed. We further propose a novel nonparametric test statistic using the Delaunay weight matrix to test whether the underlying distributions of two samples are the same or not. Applied on simulated data, the new test exhibits substantial power gain in detecting differences in principal directions between distributions. The proposed test also shows great power on a real dataset of human face images.
翻译:双样本假设检验是广泛应用于各领域的基本问题,但在高维背景下面临新的挑战。为缓解维度灾难问题,通常假设高维数据位于低维流形上。为融入数据的几何信息,我们提出应用Delaunay三角剖分并构建Delaunay权重来度量数据点间的几何邻近性。与仅利用成对距离的现有相似度度量不同,Delaunay权重能同时考虑距离与方向信息。我们开发了针对未知流形近似计算Delaunay权重的详细算法流程,并进一步提出基于Delaunay权重矩阵的新型非参数检验统计量,用于判断两个样本的潜在分布是否相同。在模拟数据上的实验表明,该新方法在检测分布主方向差异时展现出显著的统计功效提升。所提测试在真实人脸图像数据集上也表现出强大的检验效能。