Graph Neural Networks (GNNs) exhibit strong potential in node classification task through a message-passing mechanism. However, their performance often hinges on high-quality node labels, which are challenging to obtain in real-world scenarios due to unreliable sources or adversarial attacks. Consequently, label noise is common in real-world graph data, negatively impacting GNNs by propagating incorrect information during training. To address this issue, the study of Graph Neural Networks under Label Noise (GLN) has recently gained traction. However, due to variations in dataset selection, data splitting, and preprocessing techniques, the community currently lacks a comprehensive benchmark, which impedes deeper understanding and further development of GLN. To fill this gap, we introduce NoisyGL in this paper, the first comprehensive benchmark for graph neural networks under label noise. NoisyGL enables fair comparisons and detailed analyses of GLN methods on noisy labeled graph data across various datasets, with unified experimental settings and interface. Our benchmark has uncovered several important insights that were missed in previous research, and we believe these findings will be highly beneficial for future studies. We hope our open-source benchmark library will foster further advancements in this field. The code of the benchmark can be found in https://github.com/eaglelab-zju/NoisyGL.
翻译:图神经网络(GNN)通过消息传递机制在节点分类任务中展现出强大潜力。然而,其性能往往依赖于高质量的节点标签,而在现实场景中,由于不可靠的数据来源或对抗性攻击,高质量标签难以获取。因此,现实图数据中普遍存在标签噪声,这会在训练过程中传播错误信息,从而对GNN产生负面影响。为解决这一问题,标签噪声下的图神经网络(GLN)研究近期受到关注。然而,由于数据集选择、数据划分及预处理方法存在差异,该领域目前缺乏统一的综合基准,阻碍了对GLN的深入理解与进一步发展。为填补这一空白,本文提出NoisyGL——首个面向标签噪声下图神经网络的综合基准。NoisyGL通过统一的实验设置与接口,支持在不同噪声标签图数据集上对GLN方法进行公平比较与深入分析。我们的基准揭示了先前研究中被忽视的若干重要发现,相信这些结论将对未来研究大有裨益。我们希望这一开源基准库能推动该领域的进一步发展。基准代码可见于https://github.com/eaglelab-zju/NoisyGL。