Improving Node Representation by Boosting Target-Aware Contrastive Loss

Graphs model complex relationships between entities, with nodes and edges capturing intricate connections. Node representation learning involves transforming nodes into low-dimensional embeddings. These embeddings are typically used as features for downstream tasks. Therefore, their quality has a significant impact on task performance. Existing approaches for node representation learning span (semi-)supervised, unsupervised, and self-supervised paradigms. In graph domains, (semi-)supervised learning often only optimizes models based on class labels, neglecting other abundant graph signals, which limits generalization. While self-supervised or unsupervised learning produces representations that better capture underlying graph signals, the usefulness of these captured signals for downstream target tasks can vary. To bridge this gap, we introduce Target-Aware Contrastive Learning (Target-aware CL) which aims to enhance target task performance by maximizing the mutual information between the target task and node representations with a self-supervised learning process. This is achieved through a sampling function, XGBoost Sampler (XGSampler), to sample proper positive examples for the proposed Target-Aware Contrastive Loss (XTCL). By minimizing XTCL, Target-aware CL increases the mutual information between the target task and node representations, such that model generalization is improved. Additionally, XGSampler enhances the interpretability of each signal by showing the weights for sampling the proper positive examples. We show experimentally that XTCL significantly improves the performance on two target tasks: node classification and link prediction tasks, compared to state-of-the-art models.

翻译：图模型能够刻画实体间的复杂关系，其中节点与边可捕捉精细的连接结构。节点表示学习旨在将节点转换为低维嵌入向量。这些嵌入通常作为下游任务的特征输入，因此其质量对任务性能具有重要影响。现有的节点表示学习方法涵盖（半）监督、无监督与自监督范式。在图领域中，（半）监督学习往往仅基于类别标签优化模型，忽略了其他丰富的图信号，从而限制了泛化能力。尽管自监督或无监督学习生成的表示能更好地捕捉底层图信号，但这些被捕捉的信号对于下游目标任务的有效性可能存在差异。为弥合这一差距，我们提出了目标感知对比学习（Target-aware CL），其目标是通过最大化目标任务与节点表示之间的互信息，结合自监督学习过程来提升目标任务性能。该方法通过一个采样函数——XGBoost采样器（XGSampler）——为所提出的目标感知对比损失（XTCL）采样合适的正例。通过最小化XTCL，目标感知对比学习可增加目标任务与节点表示之间的互信息，从而提升模型泛化能力。此外，XGSampler通过展示采样合适正例的权重，增强了各信号的可解释性。实验表明，与现有先进模型相比，XTCL在节点分类和链接预测两个目标任务上均能显著提升性能。