Graph neural networks (GNNs) have shown great prowess in learning representations suitable for numerous graph-based machine learning tasks. When applied to semi-supervised node classification, GNNs are widely believed to work well due to the homophily assumption ("like attracts like"), and fail to generalize to heterophilous graphs where dissimilar nodes connect. Recent works design new architectures to overcome such heterophily-related limitations, citing poor baseline performance and new architecture improvements on a few heterophilous graph benchmark datasets as evidence for this notion. In our experiments, we empirically find that standard graph convolutional networks (GCNs) can actually achieve better performance than such carefully designed methods on some commonly used heterophilous graphs. This motivates us to reconsider whether homophily is truly necessary for good GNN performance. We find that this claim is not quite true, and in fact, GCNs can achieve strong performance on heterophilous graphs under certain conditions. Our work carefully characterizes these conditions, and provides supporting theoretical understanding and empirical observations. Finally, we examine existing heterophilous graphs benchmarks and reconcile how the GCN (under)performs on them based on this understanding.
翻译:摘要:图神经网络(GNN)在众多基于图的机器学习任务中展现出了卓越的表示学习能力。当应用于半监督节点分类时,普遍认为GNN因同质性假设(“物以类聚”)而表现良好,但在异质性图(即相异节点相连)上难以泛化。近期研究以基线性能不佳及新架构在若干异质性图基准数据集上的改进为据,设计了新型架构以克服此类异质性相关的局限性。通过实验,我们经验性地发现,在某些常用异质性图上,标准图卷积网络(GCN)的实际性能甚至优于这些精心设计的方法。这促使我们重新审视:同质性是否真是GNN取得优良性能的必要条件?我们发现这一论断并不完全成立——事实上,在特定条件下,GCN在异质性图上也能展现强劲性能。本文系统刻画了这些条件,并提供了相应的理论理解与实证观察。最后,我们基于这一理解,审视了现有异质性图基准,并阐释了GCN在其上的(欠佳)表现。