Efficient Topology-aware Data Augmentation for High-Degree Graph Neural Networks

In recent years, graph neural networks (GNNs) have emerged as a potent tool for learning on graph-structured data and won fruitful successes in varied fields. The majority of GNNs follow the message-passing paradigm, where representations of each node are learned by recursively aggregating features of its neighbors. However, this mechanism brings severe over-smoothing and efficiency issues over high-degree graphs (HDGs), wherein most nodes have dozens (or even hundreds) of neighbors, such as social networks, transaction graphs, power grids, etc. Additionally, such graphs usually encompass rich and complex structure semantics, which are hard to capture merely by feature aggregations in GNNs. Motivated by the above limitations, we propose TADA, an efficient and effective front-mounted data augmentation framework for GNNs on HDGs. Under the hood, TADA includes two key modules: (i) feature expansion with structure embeddings, and (ii) topology- and attribute-aware graph sparsification. The former obtains augmented node features and enhanced model capacity by encoding the graph structure into high-quality structure embeddings with our highly-efficient sketching method. Further, by exploiting task-relevant features extracted from graph structures and attributes, the second module enables the accurate identification and reduction of numerous redundant/noisy edges from the input graph, thereby alleviating over-smoothing and facilitating faster feature aggregations over HDGs. Empirically, TADA considerably improves the predictive performance of mainstream GNN models on 8 real homophilic/heterophilic HDGs in terms of node classification, while achieving efficient training and inference processes.

翻译：近年来，图神经网络已成为处理图结构数据的强大工具，并在众多领域取得了丰硕成果。大多数图神经网络遵循消息传递范式，即通过递归聚合邻居节点的特征来学习每个节点的表示。然而，在高度数图中（例如社交网络、交易图、电网等大多数节点拥有数十甚至上百个邻居的图结构），该机制会导致严重的过平滑问题与计算效率瓶颈。此外，此类图通常包含丰富而复杂的结构语义，仅依靠图神经网络中的特征聚合难以充分捕捉。基于上述局限性，本文提出TADA——一种面向高度数图神经网络的高效前端数据增强框架。TADA包含两个核心模块：（1）基于结构嵌入的特征扩展；（2）拓扑与属性感知的图稀疏化。前者通过我们设计的高效草图方法将图结构编码为高质量结构嵌入，从而获得增强的节点特征与模型容量；后者通过利用从图结构与属性中提取的任务相关特征，实现对输入图中大量冗余/噪声边的精准识别与削减，从而缓解过平滑问题并加速高度数图上的特征聚合过程。实证研究表明，在8个真实同配/异配高度数图的节点分类任务上，TADA显著提升了主流图神经网络的预测性能，同时实现了高效的训练与推理过程。