Data Augmentation for Supervised Graph Outlier Detection with Latent Diffusion Models

Graph outlier detection is a prominent task of research and application in the realm of graph neural networks. It identifies the outlier nodes that exhibit deviation from the majority in the graph. One of the fundamental challenges confronting supervised graph outlier detection algorithms is the prevalent issue of class imbalance, where the scarcity of outlier instances compared to normal instances often results in suboptimal performance. Conventional methods mitigate the imbalance by reweighting instances in the estimation of the loss function, assigning higher weights to outliers and lower weights to inliers. Nonetheless, these strategies are prone to overfitting and underfitting, respectively. Recently, generative models, especially diffusion models, have demonstrated their efficacy in synthesizing high-fidelity images. Despite their extraordinary generation quality, their potential in data augmentation for supervised graph outlier detection remains largely underexplored. To bridge this gap, we introduce GODM, a novel data augmentation for mitigating class imbalance in supervised Graph Outlier detection with latent Diffusion Models. Specifically, our proposed method consists of three key components: (1) Variantioanl Encoder maps the heterogeneous information inherent within the graph data into a unified latent space. (2) Graph Generator synthesizes graph data that are statistically similar to real outliers from latent space, and (3) Latent Diffusion Model learns the latent space distribution of real organic data by iterative denoising. Extensive experiments conducted on multiple datasets substantiate the effectiveness and efficiency of GODM. The case study further demonstrated the generation quality of our synthetic data. To foster accessibility and reproducibility, we encapsulate GODM into a plug-and-play package and release it at the Python Package Index (PyPI).

翻译：图异常检测是图神经网络领域的一项核心研究与应用任务，旨在识别图中偏离多数节点的异常节点。监督式图异常检测算法面临的基本挑战之一是普遍存在的类别不平衡问题——异常样本数量远少于正常样本，这常导致模型性能次优。传统方法通过调整损失函数中样本的权重来缓解不平衡：对异常样本赋予更高权重，对正常样本赋予较低权重。然而，这类策略分别容易导致过拟合与欠拟合问题。近年来，生成模型尤其是扩散模型在合成高保真图像方面展现出卓越性能。尽管其生成质量非凡，但在监督式图异常检测的数据增强应用中仍鲜有探索。为填补这一空白，我们提出GODM——一种基于隐式扩散模型的监督式图异常检测数据增强新方法，旨在缓解类别不平衡问题。具体而言，该方法包含三个关键组件：(1) 变分编码器将图数据的异构信息映射至统一隐空间；(2) 图生成器从隐空间合成统计特性与真实异常数据相似的图数据；(3) 隐式扩散模型通过迭代去噪学习真实正常数据的隐空间分布。在多个数据集上的大量实验验证了GODM的有效性与效率。案例研究进一步证明了合成数据的生成质量。为促进可访问性与可复现性，我们将GODM封装为即插即用工具包，并发布至Python包索引(PyPI)。