Multitask Active Learning for Graph Anomaly Detection

In the web era, graph machine learning has been widely used on ubiquitous graph-structured data. As a pivotal component for bolstering web security and enhancing the robustness of graph-based applications, the significance of graph anomaly detection is continually increasing. While Graph Neural Networks (GNNs) have demonstrated efficacy in supervised and semi-supervised graph anomaly detection, their performance is contingent upon the availability of sufficient ground truth labels. The labor-intensive nature of identifying anomalies from complex graph structures poses a significant challenge in real-world applications. Despite that, the indirect supervision signals from other tasks (e.g., node classification) are relatively abundant. In this paper, we propose a novel MultItask acTIve Graph Anomaly deTEction framework, namely MITIGATE. Firstly, by coupling node classification tasks, MITIGATE obtains the capability to detect out-of-distribution nodes without known anomalies. Secondly, MITIGATE quantifies the informativeness of nodes by the confidence difference across tasks, allowing samples with conflicting predictions to provide informative yet not excessively challenging information for subsequent training. Finally, to enhance the likelihood of selecting representative nodes that are distant from known patterns, MITIGATE adopts a masked aggregation mechanism for distance measurement, considering both inherent features of nodes and current labeled status. Empirical studies on four datasets demonstrate that MITIGATE significantly outperforms the state-of-the-art methods for anomaly detection. Our code is publicly available at: https://github.com/AhaChang/MITIGATE.

翻译：在互联网时代，图机器学习已广泛应用于无处不在的图结构数据。作为增强网络安全和提升图应用鲁棒性的关键组成部分，图异常检测的重要性日益凸显。尽管图神经网络（GNNs）在监督和半监督图异常检测中展现出有效性，但其性能依赖于足够多的真实标签。从复杂图结构中识别异常的劳动密集型特性对实际应用构成了重大挑战。尽管如此，其他任务（如节点分类）的间接监督信号相对丰富。本文提出了一种新颖的多任务主动图异常检测框架——MITIGATE。首先，通过耦合节点分类任务，MITIGATE能够在无需已知异常的情况下检测分布外节点。其次，MITIGATE通过任务间的置信度差异量化节点的信息量，使得预测冲突的样本能够为后续训练提供信息丰富但不过于困难的信息。最后，为提升选择远离已知模式的有效代表性节点的概率，MITIGATE采用了掩码聚合机制进行距离度量，同时考虑节点的固有特征和当前标注状态。在四个数据集上的实验结果表明，MITIGATE在异常检测性能上显著优于现有最优方法。我们的代码已开源：https://github.com/AhaChang/MITIGATE。