Towards Unsupervised Graph Completion Learning on Graphs with Features and Structure Missing

In recent years, graph neural networks (GNN) have achieved significant developments in a variety of graph analytical tasks. Nevertheless, GNN's superior performance will suffer from serious damage when the collected node features or structure relationships are partially missing owning to numerous unpredictable factors. Recently emerged graph completion learning (GCL) has received increasing attention, which aims to reconstruct the missing node features or structure relationships under the guidance of a specifically supervised task. Although these proposed GCL methods have made great success, they still exist the following problems: the reliance on labels, the bias of the reconstructed node features and structure relationships. Besides, the generalization ability of the existing GCL still faces a huge challenge when both collected node features and structure relationships are partially missing at the same time. To solve the above issues, we propose a more general GCL framework with the aid of self-supervised learning for improving the task performance of the existing GNN variants on graphs with features and structure missing, termed unsupervised GCL (UGCL). Specifically, to avoid the mismatch between missing node features and structure during the message-passing process of GNN, we separate the feature reconstruction and structure reconstruction and design its personalized model in turn. Then, a dual contrastive loss on the structure level and feature level is introduced to maximize the mutual information of node representations from feature reconstructing and structure reconstructing paths for providing more supervision signals. Finally, the reconstructed node features and structure can be applied to the downstream node classification task. Extensive experiments on eight datasets, three GNN variants and five missing rates demonstrate the effectiveness of our proposed method.

翻译：近年来，图神经网络（GNN）已在多种图分析任务中取得显著进展。然而，当收集到的节点特征或结构关系因众多不可预测因素而部分缺失时，GNN的优异性能会遭受严重损害。近期兴起的图补全学习（GCL）旨在特定监督任务的引导下重建缺失的节点特征或结构关系，因而受到日益关注。尽管现有GCL方法已取得巨大成功，但仍存在以下问题：对标签的依赖、重建节点特征与结构关系的偏差。此外，当收集到的节点特征与结构关系同时部分缺失时，现有GCL的泛化能力仍面临巨大挑战。为解决上述问题，我们提出一种更通用的GCL框架，借助自监督学习提升现有GNN变体在特征与结构缺失图上的任务性能，称为无监督GCL（UGCL）。具体而言，为避免GNN消息传递过程中缺失节点特征与结构之间的不匹配，我们将特征重建与结构重建分离并依次设计其个性化模型。随后，引入结构层级与特征层级的双对比损失，最大化来自特征重建路径与结构重建路径的节点表示互信息，以提供更多监督信号。最终，重建的节点特征与结构可应用于下游节点分类任务。在八个数据集、三种GNN变体及五种缺失率上的大量实验验证了所提方法的有效性。