Two-sample tests utilizing a similarity graph on observations are useful for high-dimensional and non-Euclidean data due to their flexibility and good performance under a wide range of alternatives. Existing works mainly focused on sparse graphs, such as graphs with the number of edges in the order of the number of observations, and their asymptotic results imposed strong conditions on the graph that can easily be violated by commonly constructed graphs they suggested. Moreover, the graph-based tests have better performance with denser graphs under many settings. In this work, we establish the theoretical ground for graph-based tests with graphs ranging from those recommended in current literature to much denser ones.
翻译:利用观测数据上的相似图进行双样本检验,因其灵活性和在广泛备择假设下的良好表现,对高维和非欧几里得数据尤为有效。现有研究主要关注稀疏图(例如边数与观测数同阶的图),但其渐近结果对图施加了强条件,而所建议的常见构图方式极易违反这些条件。此外,在许多场景下,基于图的检验在更稠密的图上表现更优。本研究为当前文献推荐的图乃至更稠密的图,建立了基于图的检验的理论基础。