Two-sample hypothesis testing for network comparison presents many significant challenges, including: leveraging repeated network observations and known node registration, but without requiring them to operate; relaxing strong structural assumptions; achieving finite-sample higher-order accuracy; handling different network sizes and sparsity levels; fast computation and memory parsimony; controlling false discovery rate (FDR) in multiple testing; and theoretical understandings, particularly regarding finite-sample accuracy and minimax optimality. In this paper, we develop a comprehensive toolbox, featuring a novel main method and its variants, all accompanied by strong theoretical guarantees, to address these challenges. Our method outperforms existing tools in speed and accuracy, and it is proved power-optimal. Our algorithms are user-friendly and versatile in handling various data structures (single or repeated network observations; known or unknown node registration). We also develop an innovative framework for offline hashing and fast querying as a very useful tool for large network databases. We showcase the effectiveness of our method through comprehensive simulations and applications to two real-world datasets, which revealed intriguing new structures.
翻译:双样本假设检验在网络比较中面临诸多重大挑战,包括:利用重复网络观测和已知节点配准但无需依赖这些条件运行;放宽强结构假设;实现有限样本下的高阶精度;处理不同网络规模与稀疏程度;快速计算与内存节俭;多重检验中控制错误发现率(FDR);以及理论理解,特别是关于有限样本精度与极小化极大最优性。本文开发了一个综合工具箱,其中包含一种新颖的主方法及其变体,所有方法均具有强大的理论保证,以应对这些挑战。我们的方法在速度和精度上优于现有工具,并被证明具有功效最优性。我们的算法用户友好且适用于多种数据结构(单一或重复网络观测;已知或未知节点配准)。我们还开发了一种创新的离线哈希与快速查询框架,作为大型网络数据库的实用工具。通过综合模拟和两个真实数据集的实证应用,我们展示了方法的有效性,并揭示了有趣的新结构。