Graph-based tests are a class of non-parametric two-sample tests useful for analyzing high-dimensional data. The framework offers both flexibility and power in a wide-range of testing scenarios. The test statistics are constructed from similarity graphs (such as $K$-nearest neighbor graphs) and consequently, their performance is sensitive to the structure of the graph. When the graph has problematic structures, as is common for high-dimensional data, this can result in poor or unstable performance among existing graph-based tests. We address this challenge and develop graph-based test statistics that are robust to problematic structures of the graph. The limiting null distribution of the robust test statistics is derived. We illustrate the new tests via simulation studies and a real-world application on Chicago taxi trip-data.
翻译:图结构检验是一类用于分析高维数据的非参数双样本检验方法。该框架在多样化检验场景中兼具灵活性与统计功效。其检验统计量基于相似性图(如$K$近邻图)构建,因此性能对图的结构敏感。当图存在高维数据中常见的结构缺陷时,现有图结构检验方法可能出现性能不佳或不稳定的问题。针对该挑战,我们开发了对图结构缺陷具有鲁棒性的图结构检验统计量,推导了鲁棒检验统计量的渐近零分布,并通过模拟实验及芝加哥出租车出行数据的实际应用验证了新检验方法的效果。