Pseudo Labeling is a technique used to improve the performance of semi-supervised Graph Neural Networks (GNNs) by generating additional pseudo-labels based on confident predictions. However, the quality of generated pseudo-labels has been a longstanding concern due to the sensitivity of the classification objective with respect to the given labels. To avoid the untrustworthy classification supervision indicating ``a node belongs to a specific class,'' we favor the fault-tolerant contrasting supervision demonstrating ``two nodes do not belong to the same class.'' Thus, the problem of generating high-quality pseudo-labels is then transformed into a relaxed version, i.e., identifying reliable negative pairs. To achieve this, we propose a general framework for GNNs, termed Pseudo Contrastive Learning (PCL). It separates two nodes whose positive and negative pseudo-labels target the same class. To incorporate topological knowledge into learning, we devise a topologically weighted contrastive loss that spends more effort separating negative pairs with smaller topological distances. Experimentally, we apply PCL to various GNNs, which consistently outperform their counterparts using other popular general techniques on five real-world graphs.
翻译:伪标记是一种通过基于置信预测生成额外伪标签来提升半监督图神经网络(GNNs)性能的技术。然而,由于分类目标对给定标签的敏感性,生成伪标签的质量长期以来一直备受关注。为避免不可信的分类监督(即“一个节点属于特定类别”),我们倾向于采用容错性更强的对比监督(即“两个节点不属于同一类别”)。由此,生成高质量伪标签的问题被转化为一个更宽松的版本——识别可靠的负样本对。为实现这一目标,我们提出了一种通用的GNN框架,称为伪对比学习(PCL)。该框架将正负伪标签指向同一类别的两个节点进行分离。为将拓扑知识融入学习过程,我们设计了一种拓扑加权对比损失函数,该函数通过分配更多权重来分离拓扑距离更小的负样本对。实验结果表明,将PCL应用于多种GNN模型时,其在五个真实世界图数据集上的性能均优于采用其他主流通用技术的对应模型。