The prevailing graph neural network models have achieved significant progress in graph representation learning. However, in this paper, we uncover an ever-overlooked phenomenon: the pre-trained graph representation learning model tested with full graphs underperforms the model tested with well-pruned graphs. This observation reveals that there exist confounders in graphs, which may interfere with the model learning semantic information, and current graph representation learning methods have not eliminated their influence. To tackle this issue, we propose Robust Causal Graph Representation Learning (RCGRL) to learn robust graph representations against confounding effects. RCGRL introduces an active approach to generate instrumental variables under unconditional moment restrictions, which empowers the graph representation learning model to eliminate confounders, thereby capturing discriminative information that is causally related to downstream predictions. We offer theorems and proofs to guarantee the theoretical effectiveness of the proposed approach. Empirically, we conduct extensive experiments on a synthetic dataset and multiple benchmark datasets. The results demonstrate that compared with state-of-the-art methods, RCGRL achieves better prediction performance and generalization ability.
翻译:当前主流的图神经网络模型在图表示学习领域已取得显著进展。然而,本文揭示了一个长期被忽视的现象:全图测试下的预训练图表示学习模型,其表现反而不如经良好剪枝处理的模型。这一观测表明图中存在混杂因素,会干扰模型学习语义信息,而现有图表示学习方法尚未消除其影响。为解决该问题,我们提出鲁棒性因果图表示学习(RCGRL),通过对抗混杂效应学习鲁棒图表示。RCGRL引入主动方法,在无条件矩约束下生成工具变量,使图表示学习模型能够消除混杂因素,从而捕获与下游预测存在因果关系的判别性信息。我们提供定理与证明以保障所提方法的理论有效性。在实证层面,我们于合成数据集及多个基准数据集上开展广泛实验,结果表明与现有最优方法相比,RCGRL展现出更优的预测性能与泛化能力。