Recent studies on Graph Neural Networks(GNNs) provide both empirical and theoretical evidence supporting their effectiveness in capturing structural patterns on both homophilic and certain heterophilic graphs. Notably, most real-world homophilic and heterophilic graphs are comprised of a mixture of nodes in both homophilic and heterophilic structural patterns, exhibiting a structural disparity. However, the analysis of GNN performance with respect to nodes exhibiting different structural patterns, e.g., homophilic nodes in heterophilic graphs, remains rather limited. In the present study, we provide evidence that Graph Neural Networks(GNNs) on node classification typically perform admirably on homophilic nodes within homophilic graphs and heterophilic nodes within heterophilic graphs while struggling on the opposite node set, exhibiting a performance disparity. We theoretically and empirically identify effects of GNNs on testing nodes exhibiting distinct structural patterns. We then propose a rigorous, non-i.i.d PAC-Bayesian generalization bound for GNNs, revealing reasons for the performance disparity, namely the aggregated feature distance and homophily ratio difference between training and testing nodes. Furthermore, we demonstrate the practical implications of our new findings via (1) elucidating the effectiveness of deeper GNNs; and (2) revealing an over-looked distribution shift factor on graph out-of-distribution problem and proposing a new scenario accordingly.
翻译:近期关于图神经网络(GNN)的研究为同质性与部分异质性图中结构模式的捕获有效性提供了经验与理论证据。值得注意的是,多数真实世界的同质性与异质性图由兼具同质和异质结构模式的节点混合构成,呈现出结构差异。然而,现有分析对于GNN在不同结构模式节点(例如异质性图中的同质节点)上的表现仍相当有限。本研究证明,图神经网络(GNN)在节点分类任务中通常对同质性图中的同质节点和异质性图中的异质节点表现优异,但在相反节点集上表现欠佳,呈现出性能差异。我们从理论与经验层面识别了GNN对不同结构模式测试节点的影响。继而提出一种严格的非独立同分布PAC-贝叶斯泛化界,揭示了性能差异的成因:即训练节点与测试节点间的聚合特征距离与同质性比率差异。此外,我们通过(1)阐明更深层GNN的有效性;(2)揭示图分布外问题中一个被忽视的分布偏移因素并提出相应新场景,展示了新发现的实践意义。