Label noise is a common challenge in large datasets, as it can significantly degrade the generalization ability of deep neural networks. Most existing studies focus on noisy labels in computer vision; however, graph models encompass both node features and graph topology as input, and become more susceptible to label noise through message-passing mechanisms. Recently, only a few works have been proposed to tackle the label noise on graphs. One major limitation is that they assume the graph is homophilous and the labels are smoothly distributed. Nevertheless, real-world graphs may contain varying degrees of heterophily or even be heterophily-dominated, leading to the inadequacy of current methods. In this paper, we study graph label noise in the context of arbitrary heterophily, with the aim of rectifying noisy labels and assigning labels to previously unlabeled nodes. We begin by conducting two empirical analyses to explore the impact of graph homophily on graph label noise. Following observations, we propose a simple yet efficient algorithm, denoted as LP4GLN. Specifically, LP4GLN is an iterative algorithm with three steps: (1) reconstruct the graph to recover the homophily property, (2) utilize label propagation to rectify the noisy labels, (3) select high-confidence labels to retain for the next iteration. By iterating these steps, we obtain a set of correct labels, ultimately achieving high accuracy in the node classification task. The theoretical analysis is also provided to demonstrate its remarkable denoising "effect". Finally, we conduct experiments on 10 benchmark datasets under varying graph heterophily levels and noise types, comparing the performance of LP4GLN with 7 typical baselines. Our results illustrate the superior performance of the proposed LP4GLN.
翻译:标签噪声是大数据集中的常见挑战,因为它会显著降低深度神经网络的泛化能力。现有研究大多关注计算机视觉中的噪声标签;然而,图模型同时将节点特征和图拓扑结构作为输入,并通过消息传递机制更容易受到标签噪声的影响。近期,仅有少数工作被提出用于解决图上的标签噪声问题。其中一个主要局限是它们假设图是同配的且标签平滑分布。然而,现实世界的图可能包含不同程度的异配性,甚至以异配性为主导,导致现有方法的不足。在本文中,我们研究任意异配性背景下的图标签噪声问题,旨在纠正噪声标签并为之前未标记的节点分配标签。我们首先进行两项实证分析,以探索图同配性对图标签噪声的影响。基于观察,我们提出一种简单高效的算法,记为LP4GLN。具体而言,LP4GLN是一个迭代算法,包含三步:(1) 重构图以恢复同配性特性,(2) 利用标签传播来纠正噪声标签,(3) 选择高置信度标签保留至下一次迭代。通过迭代这些步骤,我们获得一组正确标签,最终在节点分类任务中实现高准确率。我们还提供了理论分析以证明其显著的去噪“效果”。最后,我们在不同图异配性水平和噪声类型下的10个基准数据集上进行了实验,将LP4GLN的性能与7种典型基线进行比较。我们的结果展示了所提LP4GLN的优越性能。