Graph anomaly detection (GAD) is widely applied in many areas, such as financial fraud detection and social spammer detection. Anomalous nodes in the graph not only impact their own communities but also create a ripple effect on neighbors throughout the graph structure. Detecting anomalous nodes in complex graphs has been a challenging task. While existing GAD methods assume all labels are correct, real-world scenarios often involve inaccurate annotations. These noisy labels can severely degrade GAD performance because, with anomalies representing a minority class, even a small number of mislabeled instances can disproportionately interfere with detection models. Cutting edges to mitigate the negative effects of noisy labels is a good option; however, it has both positive and negative influences and also presents an issue of weak supervision. To perform effective GAD with noisy labels, we propose REinforced Graph Anomaly Detector (REGAD) by pruning the edges of candidate nodes potentially with mistaken labels. Moreover, we design the performance feedback based on strategically crafted confident labels to guide the cutting process, ensuring optimal results. Specifically, REGAD contains two novel components. (i) A tailored policy network, which involves two-step actions to remove negative effect propagation step by step. (ii) A policy-in-the-loop mechanism to identify suitable edge removal strategies that control the propagation of noise on the graph and estimate the updated structure to obtain reliable pseudo labels iteratively. Experiments on three real-world datasets demonstrate that REGAD outperforms all baselines under different noisy ratios.
翻译:图异常检测(GAD)广泛应用于金融欺诈检测和社交垃圾信息发送者检测等诸多领域。图中的异常节点不仅会影响其自身所属的社区,还会通过图结构对邻居节点产生连锁效应。在复杂图中检测异常节点一直是一项具有挑战性的任务。现有的GAD方法通常假设所有标签都是正确的,然而现实场景中的标注往往存在误差。这些噪声标签会严重降低GAD的性能,因为异常节点属于少数类,即使少量误标注的实例也会对检测模型造成不成比例的干扰。通过剪边来减轻噪声标签的负面影响是一个可行的方案;然而,剪边同时具有正面和负面影响,并且也存在弱监督的问题。为了在存在噪声标签的情况下进行有效的GAD,我们提出了强化图异常检测器(REGAD),通过对可能带有错误标签的候选节点的边进行剪枝。此外,我们设计了基于策略性构建的高置信度标签的性能反馈机制,以指导剪边过程,确保获得最优结果。具体而言,REGAD包含两个新颖的组件。(i)一个定制的策略网络,它涉及两步操作,以逐步消除负面影响的传播。(ii)一个策略循环机制,用于识别合适的边移除策略,该策略控制噪声在图上的传播,并迭代地估计更新后的图结构以获得可靠的伪标签。在三个真实世界数据集上的实验表明,REGAD在不同噪声比例下均优于所有基线方法。