Two-sample network hypothesis testing is an important inference task with applications across diverse fields such as medicine, neuroscience, and sociology. Many of these testing methodologies operate under the implicit assumption that the vertex correspondence across networks is a priori known. This assumption is often untrue, and the power of the subsequent test can degrade when there are misaligned/label-shuffled vertices across networks. This power loss due to shuffling is theoretically explored in the context of random dot product and stochastic block model networks for a pair of hypothesis tests based on Frobenius norm differences between estimated edge probability matrices or between adjacency matrices. The loss in testing power is further reinforced by numerous simulations and experiments, both in the stochastic block model and in the random dot product graph model, where the power loss across multiple recently proposed tests in the literature is considered. Lastly, the impact that shuffling can have in real-data testing is demonstrated in a pair of examples from neuroscience and from social network analysis.
翻译:双样本网络假设检验是一项重要的统计推断任务,在医学、神经科学和社会学等多个领域具有广泛应用。许多检验方法隐含地假设网络间的顶点对应关系是预先已知的。这一假设往往不成立,当网络间存在错位/标签混洗的顶点时,后续检验的功效可能会降低。本文基于随机点积图和随机分块模型网络,针对两种假设检验方法——通过比较估计边概率矩阵之间或邻接矩阵之间的Frobenius范数差异进行检验——从理论上探讨了由顶点混洗导致的功效损失。在随机分块模型和随机点积图模型中,通过大量仿真实验进一步验证了检验功效的下降,其中考察了文献中近期提出的多种检验方法。最后,通过神经科学和社会网络分析中的两个实际数据案例,展示了顶点混洗对真实数据检验可能产生的影响。