Unifying gradient regularization for Heterogeneous Graph Neural Networks

Heterogeneous Graph Neural Networks (HGNNs) are a class of powerful deep learning methods widely used to learn representations of heterogeneous graphs. Despite the fast development of HGNNs, they still face some challenges such as over-smoothing, and non-robustness. Previous studies have shown that these problems can be reduced by using gradient regularization methods. However, the existing gradient regularization methods focus on either graph topology or node features. There is no universal approach to integrate these features, which severely affects the efficiency of regularization. In addition, the inclusion of gradient regularization into HGNNs sometimes leads to some problems, such as an unstable training process, increased complexity and insufficient coverage regularized information. Furthermore, there is still short of a complete theoretical analysis of the effects of gradient regularization on HGNNs. In this paper, we propose a novel gradient regularization method called Grug, which iteratively applies regularization to the gradients generated by both propagated messages and the node features during the message-passing process. Grug provides a unified framework integrating graph topology and node features, based on which we conduct a detailed theoretical analysis of their effectiveness. Specifically, the theoretical analyses elaborate the advantages of Grug: 1) Decreasing sample variance during the training process (Stability); 2) Enhancing the generalization of the model (Universality); 3) Reducing the complexity of the model (Simplicity); 4) Improving the integrity and diversity of graph information utilization (Diversity). As a result, Grug has the potential to surpass the theoretical upper bounds set by DropMessage (AAAI-23 Distinguished Papers). In addition, we evaluate Grug on five public real-world datasets with two downstream tasks.

翻译：异构图神经网络（HGNNs）是一类强大的深度学习方法，广泛应用于异构图的表示学习。尽管HGNNs发展迅速，但仍面临过平滑、非鲁棒性等挑战。已有研究表明，梯度正则化方法可缓解这些问题。然而，现有梯度正则化方法要么聚焦于图拓扑结构，要么聚焦于节点特征，缺乏统一整合这些特征的通用方法，严重制约了正则化效率。此外，将梯度正则化引入HGNNs有时会导致训练过程不稳定、复杂度增加及正则化信息覆盖不全等问题。更重要的是，目前尚缺乏关于梯度正则化对HGNNs影响的完整理论分析。本文提出一种名为Grug的新型梯度正则化方法，该方法在消息传递过程中，对由传播消息和节点特征共同生成的梯度迭代施加正则化。Grug构建了整合图拓扑结构与节点特征的统一框架，并基于该框架对其有效性进行了详细的理论分析。具体而言，理论分析阐述了Grug的四大优势：1）降低训练过程中的样本方差（稳定性）；2）增强模型泛化能力（通用性）；3）降低模型复杂度（简洁性）；4）提升图信息利用的完整性与多样性（多样性）。因此，Grug具备超越DropMessage（AAAI-23杰出论文）所设理论上限的潜力。此外，我们在五个公开真实数据集上对Grug进行了评估，涉及两项下游任务。