Graph machine learning (GML) has made great progress in node classification, link prediction, graph classification and so on. However, graphs in reality are often structurally imbalanced, that is, only a few hub nodes have a denser local structure and higher influence. The imbalance may compromise the robustness of existing GML models, especially in learning tail nodes. This paper proposes a selective graph augmentation method (SAug) to solve this problem. Firstly, a Pagerank-based sampling strategy is designed to identify hub nodes and tail nodes in the graph. Secondly, a selective augmentation strategy is proposed, which drops the noisy neighbors of hub nodes on one side, and discovers the latent neighbors and generates pseudo neighbors for tail nodes on the other side. It can also alleviate the structural imbalance between two types of nodes. Finally, a GNN model will be retrained on the augmented graph. Extensive experiments demonstrate that SAug can significantly improve the backbone GNNs and achieve superior performance to its competitors of graph augmentation methods and hub/tail aware methods.
翻译:图机器学习在节点分类、链接预测、图分类等领域取得了巨大进展。然而,现实中的图往往存在结构失衡问题,即仅有少数枢纽节点拥有更密集的局部结构和更高影响力。这种失衡可能会损害现有GML模型的鲁棒性,尤其是在学习尾节点时。本文提出一种选择性图增强方法(SAug)来解决该问题。首先,设计了一种基于PageRank的采样策略来识别图中的枢纽节点和尾节点。其次,提出了一种选择性增强策略:一方面删除枢纽节点的噪声邻居,另一方面发现尾节点的潜在邻居并生成伪邻居。该方法还能缓解两类节点间的结构失衡。最后,在增强后的图上重新训练GNN模型。大量实验表明,SAug能显著提升骨干GNN网络的性能,并优于图增强方法及枢纽/尾节点感知方法的竞争方案。