Unsupervised graph representation learning (UGRL) based on graph neural networks (GNNs), has received increasing attention owing to its efficacy in handling graph-structured data. However, existing UGRL methods ideally assume that the node features are noise-free, which makes them fail to distinguish between useful information and noise when applied to real data with noisy features, thus affecting the quality of learned representations. This urges us to take node noisy features into account in real-world UGRL. With empirical analysis, we reveal that feature propagation, the essential operation in GNNs, acts as a "double-edged sword" in handling noisy features - it can both denoise and diffuse noise, leading to varying feature quality across nodes, even within the same node at different hops. Building on this insight, we propose a novel UGRL method based on Multi-hop feature Quality Estimation (MQE for short). Unlike most UGRL models that directly utilize propagation-based GNNs to generate representations, our approach aims to learn representations through estimating the quality of propagated features at different hops. Specifically, we introduce a Gaussian model that utilizes a learnable "meta-representation" as a condition to estimate the expectation and variance of multi-hop propagated features via neural networks. In this way, the "meta representation" captures the semantic and structural information underlying multiple propagated features but is naturally less susceptible to interference by noise, thereby serving as high-quality node representations beneficial for downstream tasks. Extensive experiments on multiple real-world datasets demonstrate that MQE in learning reliable node representations in scenarios with diverse types of feature noise.
翻译:基于图神经网络(GNNs)的无监督图表示学习(UGRL)因其在处理图结构数据方面的有效性而受到越来越多的关注。然而,现有的UGRL方法理想地假设节点特征是无噪声的,这导致其在应用于具有噪声特征的真实数据时,无法区分有用信息和噪声,从而影响所学表示的质量。这促使我们在现实世界的UGRL中考虑节点噪声特征。通过实证分析,我们发现GNN中的核心操作——特征传播——在处理噪声特征时扮演着“双刃剑”的角色:它既能去噪,也能扩散噪声,导致不同节点间的特征质量存在差异,甚至同一节点在不同跳数下的特征质量也不同。基于这一洞察,我们提出了一种基于多跳特征质量估计(简称MQE)的新型UGRL方法。与大多数直接利用基于传播的GNN生成表示的UGRL模型不同,我们的方法旨在通过估计不同跳数下传播特征的质量来学习表示。具体而言,我们引入了一个高斯模型,该模型利用一个可学习的“元表示”作为条件,通过神经网络估计多跳传播特征的期望和方差。通过这种方式,“元表示”捕获了多个传播特征背后的语义和结构信息,但天然地更不易受噪声干扰,从而可作为对下游任务有益的高质量节点表示。在多个真实世界数据集上的大量实验证明,MQE能够在多种类型特征噪声的场景下学习到可靠的节点表示。