GCTAM: Global and Contextual Truncated Affinity Combined Maximization Model For Unsupervised Graph Anomaly Detection

Anomalies often occur in real-world information networks/graphs, such as malevolent users, malicious comments, banned users, and fake news in social graphs. The latest graph anomaly detection methods use a novel mechanism called truncated affinity maximization (TAM) to detect anomaly nodes without using any label information and achieve impressive results. TAM maximizes the affinities among the normal nodes while truncating the affinities of the anomalous nodes to identify the anomalies. However, existing TAM-based methods truncate suspicious nodes according to a rigid threshold that ignores the specificity and high-order affinities of different nodes. This inevitably causes inefficient truncations from both normal and anomalous nodes, limiting the effectiveness of anomaly detection. To this end, this paper proposes a novel truncation model combining contextual and global affinity to truncate the anomalous nodes. The core idea of the work is to use contextual truncation to decrease the affinity of anomalous nodes, while global truncation increases the affinity of normal nodes. Extensive experiments on massive real-world datasets show that our method surpasses peer methods in most graph anomaly detection tasks. In highlights, compared with previous state-of-the-art methods, the proposed method has +15\% $\sim$ +20\% improvements in two famous real-world datasets, Amazon and YelpChi. Notably, our method works well in large datasets, Amazin-all and YelpChi-all, and achieves the best results, while most previous models cannot complete the tasks.

翻译：现实世界的信息网络/图中常出现异常，例如社交图中的恶意用户、恶意评论、封禁用户及虚假新闻。最新的图异常检测方法采用一种称为截断亲和力最大化（TAM）的新机制，在不使用任何标签信息的情况下检测异常节点，并取得了显著成果。TAM通过最大化正常节点间的亲和力，同时截断异常节点的亲和力来识别异常。然而，现有的基于TAM的方法依据固定阈值截断可疑节点，忽略了不同节点的特异性与高阶亲和力。这不可避免地导致对正常节点和异常节点的截断效率低下，限制了异常检测的效果。为此，本文提出一种结合上下文与全局亲和力的新型截断模型来截断异常节点。该工作的核心思想是利用上下文截断降低异常节点的亲和力，同时通过全局截断增强正常节点的亲和力。在大量真实世界数据集上的广泛实验表明，我们的方法在多数图异常检测任务中超越了同类方法。突出表现为，相较于先前的最优方法，所提方法在Amazon和YelpChi两个著名真实数据集上实现了+15\% $\sim$ +20\%的性能提升。值得注意的是，我们的方法在大型数据集Amazin-all和YelpChi-all上表现优异并取得最佳结果，而多数先前模型无法完成这些任务。