Graph data has a unique structure that deviates from standard data assumptions, often necessitating modifications to existing methods or the development of new ones to ensure valid statistical analysis. In this paper, we explore the notion of correlation and dependence between two binary graphs. Given vertex communities, we propose community correlations to measure the edge association, which equals zero if and only if the two graphs are conditionally independent within a specific pair of communities. The set of community correlations naturally leads to the maximum community correlation, indicating conditional independence on all possible pairs of communities, and to the overall graph correlation, which equals zero if and only if the two binary graphs are unconditionally independent. We then compute the sample community correlations via graph encoder embedding, proving they converge to their respective population versions, and derive the asymptotic null distribution to enable a fast, valid, and consistent test for conditional or unconditional independence between two binary graphs. The theoretical results are validated through comprehensive simulations, and we provide two real-data examples: one using Enron email networks and another using mouse connectome graphs, to demonstrate the utility of the proposed correlation measures.
翻译:图数据具有偏离标准数据假设的独特结构,通常需要对现有方法进行修改或开发新方法,以确保统计分析的可靠性。本文探讨了两个二元图之间的相关性与依赖性概念。给定顶点社区,我们提出社区相关性来衡量边关联性,该相关性当且仅当两个图在特定社区对内条件独立时为零。社区相关性集合自然地导出最大社区相关性——表明在所有可能的社区对上条件独立,以及整体图相关性——当且仅当两个二元图无条件独立时为零。随后,我们通过图编码器嵌入计算样本社区相关性,证明其收敛于各自的总体版本,并推导渐近零分布,从而实现对两个二元图之间条件或无条件独立性的快速、有效且一致的检验。理论结果通过综合模拟得到验证,并提供了两个真实数据示例:一个使用安然公司邮件网络,另一个使用小鼠连接组图,以展示所提出的相关性度量方法的实用性。