Graph anomaly detection (GAD) is a challenging binary classification problem due to its different structural distribution between anomalies and normal nodes -- abnormal nodes are a minority, therefore holding high heterophily and low homophily compared to normal nodes. Furthermore, due to various time factors and the annotation preferences of human experts, the heterophily and homophily can change across training and testing data, which is called structural distribution shift (SDS) in this paper. The mainstream methods are built on graph neural networks (GNNs), benefiting the classification of normals from aggregating homophilous neighbors, yet ignoring the SDS issue for anomalies and suffering from poor generalization. This work solves the problem from a feature view. We observe that the degree of SDS varies between anomalies and normal nodes. Hence to address the issue, the key lies in resisting high heterophily for anomalies meanwhile benefiting the learning of normals from homophily. We tease out the anomaly features on which we constrain to mitigate the effect of heterophilous neighbors and make them invariant. We term our proposed framework as Graph Decomposition Network (GDN). Extensive experiments are conducted on two benchmark datasets, and the proposed framework achieves a remarkable performance boost in GAD, especially in an SDS environment where anomalies have largely different structural distribution across training and testing environments. Codes are open-sourced in https://github.com/blacksingular/wsdm_GDN.
翻译:图异常检测(GAD)是一个具有挑战性的二分类问题,其根源在于异常节点与正常节点之间存在不同的结构分布——异常节点是少数群体,因此与正常节点相比,具有高异质性和低同质性。此外,由于多种时间因素和人类专家的标注偏好,异质性和同质性可能会在训练数据和测试数据之间发生变化,本文将此现象称为结构分布偏移(SDS)。主流方法基于图神经网络(GNN),通过聚合同质邻居来提升正常节点的分类效果,却忽略了异常节点面临的SDS问题,导致泛化性能不佳。本文从特征视角解决该问题。我们观察到异常节点与正常节点的SDS程度不同。因此,解决问题的关键在于:在抵抗异常节点的高异质性影响的同时,使正常节点能从同质性中受益学习。我们提取异常特征,并对其施加约束以减轻异质邻居的影响,使其保持不变性。我们将所提出的框架命名为图分解网络(GDN)。在两个基准数据集上进行了大量实验,所提出的框架在GAD任务中取得了显著的性能提升,尤其是在异常节点在训练和测试环境中具有显著不同结构分布的SDS环境下表现突出。代码已开源在 https://github.com/blacksingular/wsdm_GDN。