Although link prediction on graphs has achieved great success with the development of graph neural networks (GNNs), the potential robustness under the edge noise is still less investigated. To close this gap, we first conduct an empirical study to disclose that the edge noise bilaterally perturbs both input topology and target label, yielding severe performance degradation and representation collapse. To address this dilemma, we propose an information-theory-guided principle, Robust Graph Information Bottleneck (RGIB), to extract reliable supervision signals and avoid representation collapse. Different from the basic information bottleneck, RGIB further decouples and balances the mutual dependence among graph topology, target labels, and representation, building new learning objectives for robust representation against the bilateral noise. Two instantiations, RGIB-SSL and RGIB-REP, are explored to leverage the merits of different methodologies, i.e., self-supervised learning and data reparameterization, for implicit and explicit data denoising, respectively. Extensive experiments on six datasets and three GNNs with diverse noisy scenarios verify the effectiveness of our RGIB instantiations. The code is publicly available at: https://github.com/tmlr-group/RGIB.
翻译:尽管随着图神经网络(GNNs)的发展,图上的链路预测已取得巨大成功,但其在边噪声下的潜在鲁棒性仍鲜有研究。为弥补这一空白,我们首先通过实证研究揭示,边噪声会从输入拓扑和目标标签两个方向进行双边扰动,导致严重的性能退化与表征坍缩。针对这一困境,我们提出了一种基于信息论引导的原则——鲁棒图信息瓶颈(RGIB),以提取可靠的监督信号并避免表征坍缩。与基础信息瓶颈不同,RGIB进一步解耦并平衡图拓扑、目标标签与表征之间的相互依赖关系,构建了针对双边噪声的鲁棒表征新学习目标。我们探索了两种实例化方法——RGIB-SSL和RGIB-REP,分别利用自监督学习和数据重参数化方法的优势,实现隐式与显式的数据去噪。在六个数据集和三种GNN模型上,针对多种噪声场景的广泛实验验证了我们RGIB实例化的有效性。代码已开源:https://github.com/tmlr-group/RGIB。