Random graphs are statistical models that have many applications, ranging from neuroscience to social network analysis. Of particular interest in some applications is the problem of testing two random graphs for equality of generating distributions. Tang et al. (2017) propose a test for this setting. This test consists of embedding the graph into a low-dimensional space via the adjacency spectral embedding (ASE) and subsequently using a kernel two-sample test based on the maximum mean discrepancy. However, if the two graphs being compared have an unequal number of vertices, the test of Tang et al. (2017) may not be valid. We demonstrate the intuition behind this invalidity and propose a correction that makes any subsequent kernel- or distance-based test valid. Our method relies on sampling based on the asymptotic distribution for the ASE. We call these altered embeddings the corrected adjacency spectral embeddings (CASE). We also show that CASE remedies the exchangeability problem of the original test and demonstrate the validity and consistency of the test that uses CASE via a simulation study. Lastly, we apply our proposed test to the problem of determining equivalence of generating distributions in human connectomes extracted from diffusion magnetic resonance imaging (dMRI) at different scales.
翻译:随机图是一种广泛应用于神经科学到社交网络分析等领域的统计模型。在某些应用中,特别关注的问题是对两个随机图是否具有相同的生成分布进行检验。Tang等人(2017)针对这一场景提出了一种检验方法。该方法通过邻接谱嵌入(ASE)将图嵌入到低维空间,随后基于最大均值差异使用核两样本检验。然而,当被比较的两个图的顶点数不相等时,Tang等人(2017)的检验可能失效。我们阐释了这种失效背后的直觉原理,并提出一种修正方法,使得任何后续基于核或距离的检验都变得有效。我们的方法基于ASE的渐近分布进行采样,并将这些经修正的嵌入称为修正邻接谱嵌入(CASE)。我们还证明了CASE能够解决原始检验中的可交换性问题,并通过模拟研究验证了使用CASE的检验的有效性和一致性。最后,我们将所提出的检验方法应用于判定从不同尺度的弥散磁共振成像(dMRI)中提取的人类脑连接组生成分布是否等价的问题。