This paper considers an important Graph Anomaly Detection (GAD) task, namely open-set GAD, which aims to detect anomalous nodes using a small number of labelled training normal and anomaly nodes (known as seen anomalies) that cannot illustrate all possible inference-time abnormalities. The availability of that labelled data provides crucial prior knowledge about abnormalities for GAD models, enabling substantially reduced detection errors. However, current methods tend to over-emphasise fitting the seen anomalies, leading to a weak generalisation ability to detect unseen anomalies, i.e., those that are not illustrated by the labelled anomaly nodes. Further, they were introduced to handle Euclidean data, failing to effectively capture important non-Euclidean features for GAD. In this work, we propose a novel open-set GAD approach, namely Normal Structure Regularisation (NSReg), to achieve generalised detection ability to unseen anomalies, while maintaining its effectiveness on detecting seen anomalies. The key idea in NSReg is to introduce a regularisation term that enforces the learning of compact, semantically-rich representations of normal nodes based on their structural relations to other nodes. When being optimised with supervised anomaly detection losses, the regularisation term helps incorporate strong normality into the modelling, empowering the joint learning of both seen abnormality and normality of the nodes, and thus, it effectively avoids the over emphasis on solely fitting the seen anomalies during training. Extensive empirical results on six real-world datasets demonstrate the superiority of our proposed NSReg for open-set GAD.
翻译:本文考虑一项重要的图异常检测(GAD)任务,即开放集GAD,该任务旨在利用少量标注的训练正常节点和异常节点(即已知异常)检测异常节点,而这些已知异常无法涵盖所有可能的推理阶段异常。标注数据的存在为GAD模型提供了关于异常的关键先验知识,从而能够显著降低检测错误。然而,现有方法倾向于过度拟合已知异常,导致对未知异常的泛化能力较弱,即无法有效检测那些未被标注异常节点所表示的异常。此外,这些方法最初针对欧几里得数据设计,无法有效捕获图异常检测所需的重要非欧几里得特征。本文提出了一种新颖的开放集GAD方法,即正常结构正则化(NSReg),以实现对未知异常的泛化检测能力,同时保持对已知异常的有效检测。NSReg的核心思想是引入一个正则化项,基于正常节点与其他节点的结构关系,强制学习紧凑且语义丰富的正常节点表示。在与监督异常检测损失联合优化时,该正则化项有助于将强正常性融入模型,从而实现节点已知异常性和正常性的联合学习,有效避免训练过程中仅过度拟合已知异常。在六个真实世界数据集上的广泛实验结果表明,我们提出的NSReg方法在开放集GAD任务中具有优越性。