Graph anomaly detection (GAD) is a vital task since even a few anomalies can pose huge threats to benign users. Recent semi-supervised GAD methods, which can effectively leverage the available labels as prior knowledge, have achieved superior performances than unsupervised methods. In practice, people usually need to identify anomalies on new (sub)graphs to secure their business, but they may lack labels to train an effective detection model. One natural idea is to directly adopt a trained GAD model to the new (sub)graph for testing. However, we find that existing semi-supervised GAD methods suffer from poor generalization issue, i.e., well-trained models could not perform well on an unseen area (i.e., not accessible in training) of the same graph. It may cause great troubles. In this paper, we base on the phenomenon and propose a general and novel research problem of generalized graph anomaly detection that aims to effectively identify anomalies on both the training-domain graph and unseen testing graph to eliminate potential dangers. Nevertheless, it is a challenging task since only limited labels are available, and the normal background may differ between training and testing data. Accordingly, we propose a data augmentation method named \textit{AugAN} (\uline{Aug}mentation for \uline{A}nomaly and \uline{N}ormal distributions) to enrich training data and boost the generalizability of GAD models. Experiments verify the effectiveness of our method in improving model generalizability.
翻译:图异常检测(GAD)是一项至关重要的任务,因为即便少量异常也可能对良性用户构成巨大威胁。近年来,半监督GAD方法能够有效利用可用标签作为先验知识,其性能已超越无监督方法。实践中,人们常需在新(子)图上识别异常以保障业务安全,但可能缺乏训练有效检测模型的标签。一个自然的思路是直接采用训练好的GAD模型对新(子)图进行测试。然而,我们发现现有半监督GAD方法存在泛化性差的问题,即训练良好的模型在相同图的未见区域(即训练中未接触的区域)上表现不佳,这可能导致严重问题。本文基于这一现象,提出一个新颖且通用的研究问题——广义图异常检测,旨在有效识别训练域图和未见测试图上的异常,以消除潜在风险。然而,由于仅有有限标签可用且训练与测试数据的正常背景可能存在差异,该任务极具挑战性。为此,我们提出一种名为\textit{AugAN}(异常与正态分布的数据增强方法)的数据增强方法,以丰富训练数据并提升GAD模型的泛化性。实验验证了该方法在改善模型泛化性方面的有效性。