Graph Anomaly Detection (GAD) has recently become a hot research spot due to its practicability and theoretical value. Since GAD emphasizes the application and the rarity of anomalous samples, enriching the varieties of its datasets is fundamental work. Thus, this paper present DGraph, a real-world dynamic graph in the finance domain. DGraph overcomes many limitations of current GAD datasets. It contains about 3M nodes, 4M dynamic edges, and 1M ground-truth nodes. We provide a comprehensive observation of DGraph, revealing that anomalous nodes and normal nodes generally have different structures, neighbor distribution, and temporal dynamics. Moreover, it suggests that unlabeled nodes are also essential for detecting fraudsters. Furthermore, we conduct extensive experiments on DGraph. Observation and experiments demonstrate that DGraph is propulsive to advance GAD research and enable in-depth exploration of anomalous nodes.
翻译:图异常检测(Graph Anomaly Detection, GAD)因其实用性和理论价值,近期成为研究热点。由于GAD强调实际应用与异常样本的稀缺性,丰富其数据集类型是基础性工作。为此,本文提出DGraph——一个金融领域的真实动态图数据集。DGraph克服了当前GAD数据集的多项局限,包含约300万个节点、400万条动态边及100万个真实标注节点。我们通过全面观察发现:异常节点与正常节点在结构特征、邻居分布及时间动态性上呈现显著差异。此外,未标注节点对欺诈检测同样具有关键作用。我们基于DGraph开展了大量实验,结果证明该数据集能有效推动GAD研究发展,并促进对异常节点的深度探索。