On the Topology Awareness and Generalization Performance of Graph Neural Networks

Many computer vision and machine learning problems are modelled as learning tasks on graphs where graph neural networks GNNs have emerged as a dominant tool for learning representations of graph structured data A key feature of GNNs is their use of graph structures as input enabling them to exploit the graphs inherent topological properties known as the topology awareness of GNNs Despite the empirical successes of GNNs the influence of topology awareness on generalization performance remains unexplored, particularly for node level tasks that diverge from the assumption of data being independent and identically distributed IID The precise definition and characterization of the topology awareness of GNNs especially concerning different topological features are still unclear This paper introduces a comprehensive framework to characterize the topology awareness of GNNs across any topological feature Using this framework we investigate the effects of topology awareness on GNN generalization performance Contrary to the prevailing belief that enhancing the topology awareness of GNNs is always advantageous our analysis reveals a critical insight improving the topology awareness of GNNs may inadvertently lead to unfair generalization across structural groups which might not be desired in some scenarios Additionally we conduct a case study using the intrinsic graph metric the shortest path distance on various benchmark datasets The empirical results of this case study confirm our theoretical insights Moreover we demonstrate the practical applicability of our framework by using it to tackle the cold start problem in graph active learning

翻译：许多计算机视觉与机器学习问题被建模为图上的学习任务，其中图神经网络（GNNs）已成为学习图结构数据表示的主要工具。GNNs的一个关键特性是它们以图结构作为输入，使其能够利用图固有的拓扑性质，这被称为GNNs的拓扑感知能力。尽管GNNs在实证中取得了成功，但拓扑感知对泛化性能的影响仍未得到充分探索，特别是对于节点级任务，这些任务偏离了数据独立同分布（IID）的假设。GNNs拓扑感知的精确定义与表征，尤其是关于不同拓扑特征的部分，目前仍不明确。本文引入了一个综合性框架，用于表征GNNs在任何拓扑特征上的拓扑感知能力。利用这一框架，我们研究了拓扑感知对GNN泛化性能的影响。与普遍认为增强GNNs的拓扑感知总是有益的观点相反，我们的分析揭示了一个关键见解：提高GNNs的拓扑感知可能无意中导致跨结构组的不公平泛化，这在某些场景中可能并非所愿。此外，我们使用内在图度量——最短路径距离，在多个基准数据集上进行了案例研究。该案例研究的实证结果证实了我们的理论见解。进一步地，我们通过将该框架应用于解决图主动学习中的冷启动问题，展示了其实际适用性。