SpotTarget: Rethinking the Effect of Target Edges for Link Prediction in Graph Neural Networks

Graph Neural Networks (GNNs) have demonstrated promising outcomes across various tasks, including node classification and link prediction. Despite their remarkable success in various high-impact applications, we have identified three common pitfalls in message passing for link prediction. Particularly, in prevalent GNN frameworks (e.g., DGL and PyTorch-Geometric), the target edges (i.e., the edges being predicted) consistently exist as message passing edges in the graph during training. Consequently, this results in overfitting and distribution shift, both of which adversely impact the generalizability to test the target edges. Additionally, during test time, the failure to exclude the test target edges leads to implicit test leakage caused by neighborhood aggregation. In this paper, we analyze these three pitfalls and investigate the impact of including or excluding target edges on the performance of nodes with varying degrees during training and test phases. Our theoretical and empirical analysis demonstrates that low-degree nodes are more susceptible to these pitfalls. These pitfalls can have detrimental consequences when GNNs are implemented in production systems. To systematically address these pitfalls, we propose SpotTarget, an effective and efficient GNN training framework. During training, SpotTarget leverages our insight regarding low-degree nodes and excludes train target edges connected to at least one low-degree node. During test time, it emulates real-world scenarios of GNN usage in production and excludes all test target edges. Our experiments conducted on diverse real-world datasets, demonstrate that SpotTarget significantly enhances GNNs, achieving up to a 15x increase in accuracy in sparse graphs. Furthermore, SpotTarget consistently and dramatically improves the performance for low-degree nodes in dense graphs.

翻译：摘要：图神经网络（GNN）在节点分类和链接预测等多种任务中展现出令人瞩目的成果。尽管GNN在各种高影响力应用中取得了显著成功，但我们发现了其在链接预测消息传递中的三个常见陷阱。具体而言，在主流GNN框架（如DGL和PyTorch-Geometric）中，目标边（即被预测的边）在训练期间始终作为图中的消息传递边存在。这导致过拟合和分布偏移，两者均对测试目标边的泛化能力产生不利影响。此外，在测试阶段，若未能排除测试目标边，则会因邻域聚合导致隐式测试泄露。本文分析了这三个陷阱，并研究了在训练和测试阶段包含或排除目标边对不同度数节点性能的影响。理论和实证分析表明，低度数节点更容易受到这些陷阱的影响。当GNN部署于生产系统时，这些陷阱可能产生有害后果。为系统性地应对这些陷阱，我们提出了SpotTarget——一种高效且有效的GNN训练框架。在训练过程中，SpotTarget利用关于低度数节点的洞察，排除与至少一个低度数节点相连的训练目标边；在测试阶段，它模拟GNN在生产环境中的实际应用场景，排除所有测试目标边。我们在多样化真实数据集上的实验表明，SpotTarget显著提升了GNN性能，在稀疏图中准确率提升最高达15倍。此外，SpotTarget在密集图中持续且大幅提升低度数节点的性能。

相关内容

链路预测

关注 14

网络中的链路预测(Link Prediction)是指如何通过已知的网络节点以及网络结构等信息预测网络中尚未产生连边的两个节点之间产生链接的可能性。这种预测既包含了对未知链接（exist yet unknown links）的预测也包含了对未来链接（future links）的预测。该问题的研究在理论和应用两个方面都具有重要的意义和价值。

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

专知会员服务

52+阅读 · 2022年10月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日