Deep learning has achieved remarkable success in graph-related tasks, yet this accomplishment heavily relies on large-scale high-quality annotated datasets. However, acquiring such datasets can be cost-prohibitive, leading to the practical use of labels obtained from economically efficient sources such as web searches and user tags. Unfortunately, these labels often come with noise, compromising the generalization performance of deep networks. To tackle this challenge and enhance the robustness of deep learning models against label noise in graph-based tasks, we propose a method called ERASE (Error-Resilient representation learning on graphs for lAbel noiSe tolerancE). The core idea of ERASE is to learn representations with error tolerance by maximizing coding rate reduction. Particularly, we introduce a decoupled label propagation method for learning representations. Before training, noisy labels are pre-corrected through structural denoising. During training, ERASE combines prototype pseudo-labels with propagated denoised labels and updates representations with error resilience, which significantly improves the generalization performance in node classification. The proposed method allows us to more effectively withstand errors caused by mislabeled nodes, thereby strengthening the robustness of deep networks in handling noisy graph data. Extensive experimental results show that our method can outperform multiple baselines with clear margins in broad noise levels and enjoy great scalability. Codes are released at https://github.com/eraseai/erase.
翻译:深度学习在图相关任务中取得了显著成功,但这种成就高度依赖于大规模高质量标注数据集。然而,获取此类数据集成本高昂,因此实践中常采用从网络搜索和用户标签等经济高效来源获得的标注数据。遗憾的是,这些标签往往包含噪声,从而损害了深度网络的泛化性能。为应对这一挑战并增强深度学习模型在图任务中应对标签噪声的鲁棒性,我们提出名为ERASE(面向标签噪声容错的图结构误差鲁棒表示学习)的方法。ERASE的核心思想是通过最大化编码率降低来学习具有误差容限的表示。具体而言,我们提出一种解耦标签传播方法用于表示学习。训练前,通过结构去噪对噪声标签进行预校正;训练过程中,ERASE将原型伪标签与传播去噪标签相结合,并更新具有误差鲁棒性的表示,从而显著提升节点分类的泛化性能。该方法使我们能更有效地抵御误标节点引起的误差,进而强化深度网络处理含噪图数据的鲁棒性。大量实验结果表明,我们的方法在广泛噪声水平下均能以显著优势超越多种基线方法,并具有良好的可扩展性。代码已发布于https://github.com/eraseai/erase。