Graphs with heterophily have been regarded as challenging scenarios for Graph Neural Networks (GNNs), where nodes are connected with dissimilar neighbors through various patterns. In this paper, we present theoretical understandings of the impacts of different heterophily patterns for GNNs by incorporating the graph convolution (GC) operations into fully connected networks via the proposed Heterophilous Stochastic Block Models (HSBM), a general random graph model that can accommodate diverse heterophily patterns. Firstly, we show that by applying a GC operation, the separability gains are determined by two factors, i.e., the Euclidean distance of the neighborhood distributions and $\sqrt{\mathbb{E}\left[\operatorname{deg}\right]}$, where $\mathbb{E}\left[\operatorname{deg}\right]$ is the averaged node degree. It reveals that the impact of heterophily on classification needs to be evaluated alongside the averaged node degree. Secondly, we show that the topological noise has a detrimental impact on separability, which is equivalent to degrading $\mathbb{E}\left[\operatorname{deg}\right]$. Finally, when applying multiple GC operations, we show that the separability gains are determined by the normalized distance of the $l$-powered neighborhood distributions. It indicates that the nodes still possess separability as $l$ goes to infinity in a wide range of regimes. Extensive experiments on both synthetic and real-world data verify the effectiveness of our theory.
翻译:异质性图被视为图神经网络(GNNs)面临挑战的场景,其中节点通过多种模式与相异的邻居相连。本文通过将图卷积(GC)操作整合到全连接网络中,并借助提出的异质性随机块模型(HSBM)——一种能够容纳多种异质性模式的通用随机图模型,从理论上理解不同异质性模式对GNNs的影响。首先,我们证明应用GC操作时,可分离性增益由两个因素决定:邻域分布的欧几里得距离和$\sqrt{\mathbb{E}\left[\operatorname{deg}\right]}$,其中$\mathbb{E}\left[\operatorname{deg}\right]$为平均节点度。这表明异质性对分类的影响需要结合平均节点度来评估。其次,我们证明拓扑噪声对可分离性具有不利影响,其效果等同于降低$\mathbb{E}\left[\operatorname{deg}\right]$。最后,当应用多次GC操作时,我们证明可分离性增益由$l$次幂邻域分布的归一化距离决定。这表明在广泛机制中,即使$l$趋于无穷大,节点仍保持可分离性。在合成数据和真实数据上的大量实验验证了我们理论的有效性。