We investigate size-induced distribution shifts in graphs and assess their impact on the ability of graph neural networks (GNNs) to generalize to larger graphs relative to the training data. Existing literature presents conflicting conclusions on GNNs' size generalizability, primarily due to disparities in application domains and underlying assumptions concerning size-induced distribution shifts. Motivated by this, we take a data-driven approach: we focus on real biological datasets and seek to characterize the types of size-induced distribution shifts. Diverging from prior approaches, we adopt a spectral perspective and identify that spectrum differences induced by size are related to differences in subgraph patterns (e.g., average cycle lengths). While previous studies have identified that the inability of GNNs in capturing subgraph information negatively impacts their in-distribution generalization, our findings further show that this decline is more pronounced when evaluating on larger test graphs not encountered during training. Based on these spectral insights, we introduce a simple yet effective model-agnostic strategy, which makes GNNs aware of these important subgraph patterns to enhance their size generalizability. Our empirical results reveal that our proposed size-insensitive attention strategy substantially enhances graph classification performance on large test graphs, which are 2-10 times larger than the training graphs, resulting in an improvement in F1 scores by up to 8%.
翻译:我们研究了图中由尺寸引起的分布偏移,并评估了这些偏移对图神经网络(GNNs)泛化到比训练数据更大的图的能力的影响。现有文献对GNNs的尺寸泛化性给出了相互矛盾的结论,这主要源于应用领域以及关于尺寸诱导分布偏移的基本假设的差异。受此启发,我们采用数据驱动的方法:聚焦于真实生物数据集,并试图刻画尺寸诱导分布偏移的类型。与先前方法不同,我们从谱视角出发,发现尺寸引起的谱差异与子图模式(例如平均环长)的差异相关。先前研究已指出,GNNs无法捕捉子图信息会对其在分布内的泛化产生负面影响,而我们的发现进一步表明,当在训练中未曾遇到的大测试图上进行评估时,这种能力下降更为显著。基于这些谱视角的见解,我们提出了一种简单而有效的模型无关策略,使GNNs能够感知这些重要的子图模式,从而增强其尺寸泛化性。实验结果表明,我们提出的尺寸不敏感注意力策略显著提升了在大型测试图上的图分类性能,这些测试图的大小是训练图的2-10倍,使F1分数提升了高达8%。