Many computer vision and machine learning problems are modelled as learning tasks on graphs, where graph neural networks (GNNs) have emerged as a dominant tool for learning representations of graph-structured data. A key feature of GNNs is their use of graph structures as input, enabling them to exploit the graphs' inherent topological properties-known as the topology awareness of GNNs. Despite the empirical successes of GNNs, the influence of topology awareness on generalization performance remains unexplored, particularly for node-level tasks that diverge from the assumption of data being independent and identically distributed (I.I.D.). The precise definition and characterization of the topology awareness of GNNs, especially concerning different topological features, are still unclear. This paper introduces a comprehensive framework to characterize the topology awareness of GNNs across any topological feature. Using this framework, we investigate the effects of topology awareness on GNN generalization performance. Contrary to the prevailing belief that enhancing the topology awareness of GNNs is always advantageous, our analysis reveals a critical insight: improving the topology awareness of GNNs may inadvertently lead to unfair generalization across structural groups, which might not be desired in some scenarios. Additionally, we conduct a case study using the intrinsic graph metric, the shortest path distance, on various benchmark datasets. The empirical results of this case study confirm our theoretical insights. Moreover, we demonstrate the practical applicability of our framework by using it to tackle the cold start problem in graph active learning.
翻译:许多计算机视觉与机器学习问题被建模为图上的学习任务,其中图神经网络(GNN)已成为处理图结构数据表示的主流工具。GNN的核心特征是其利用图结构作为输入,从而能够挖掘图固有的拓扑属性——这一能力被称为GNN的拓扑感知性。尽管GNN在经验上取得了成功,但拓扑感知性对泛化性能的影响仍未得到充分探索,尤其是对于偏离数据独立同分布(I.I.D.)假设的节点级任务。目前,GNN拓扑感知性的精确定义与刻画方法(特别是针对不同拓扑特征)仍不明确。本文提出了一个统一框架,用于刻画GNN在任意拓扑特征上的拓扑感知性。基于该框架,我们研究了拓扑感知性对GNN泛化性能的影响。与"增强GNN拓扑感知性始终有益"的主流认知相反,我们的分析揭示了一个关键洞见:提升GNN的拓扑感知性可能无意中导致跨结构组的不公平泛化,这在某些场景中可能并非理想结果。此外,我们以内在图度量——最短路径距离为例,在多个基准数据集上进行了案例研究,实验结果验证了理论洞见。最后,我们展示了该框架的实际应用价值,将其用于解决图主动学习中的冷启动问题。