Masked graph autoencoder (MGAE) has emerged as a promising self-supervised graph pre-training (SGP) paradigm due to its simplicity and effectiveness. However, existing efforts perform the mask-then-reconstruct operation in the raw data space as is done in computer vision (CV) and natural language processing (NLP) areas, while neglecting the important non-Euclidean property of graph data. As a result, the highly unstable local connection structures largely increase the uncertainty in inferring masked data and decrease the reliability of the exploited self-supervision signals, leading to inferior representations for downstream evaluations. To address this issue, we propose a novel SGP method termed Robust mAsked gRaph autoEncoder (RARE) to improve the certainty in inferring masked data and the reliability of the self-supervision mechanism by further masking and reconstructing node samples in the high-order latent feature space. Through both theoretical and empirical analyses, we have discovered that performing a joint mask-then-reconstruct strategy in both latent feature and raw data spaces could yield improved stability and performance. To this end, we elaborately design a masked latent feature completion scheme, which predicts latent features of masked nodes under the guidance of high-order sample correlations that are hard to be observed from the raw data perspective. Specifically, we first adopt a latent feature predictor to predict the masked latent features from the visible ones. Next, we encode the raw data of masked samples with a momentum graph encoder and subsequently employ the resulting representations to improve predicted results through latent feature matching. Extensive experiments on seventeen datasets have demonstrated the effectiveness and robustness of RARE against state-of-the-art (SOTA) competitors across three downstream tasks.
翻译:掩码图自编码器(MGAE)因其简洁与高效性,已成为一种前景广阔的自监督图预训练(SGP)范式。然而,现有工作沿袭计算机视觉(CV)和自然语言处理(NLP)领域的做法,在原始数据空间中执行“掩码-重构”操作,却忽略了图数据重要的非欧几里得特性。这导致高度不稳定的局部连接结构显著增大了推断掩码数据的不确定性,并降低了所利用自监督信号的可靠性,从而为下游评估产生次优表示。为解决该问题,我们提出一种名为鲁棒掩码图自编码器(RARE)的新型SGP方法,通过在高阶潜在特征空间中进一步对节点样本进行掩码与重构,提升推断掩码数据的确定性以及自监督机制的可靠性。理论分析与实证研究均发现,在潜在特征空间与原始数据空间中联合执行掩码-重构策略,能带来更优的稳定性与性能。为此,我们精心设计了一种掩码潜在特征补全方案:在高阶样本相关性——这种相关性难以从原始数据视角观察——的引导下,预测掩码节点的潜在特征。具体而言,我们首先采用潜在特征预测器,从可见特征中预测被掩码的潜在特征;接着,通过动量图编码器对掩码样本的原始数据进行编码,并利用其生成表示通过潜在特征匹配来改进预测结果。在十七个数据集上的大量实验表明,RARE在三个下游任务中相较最先进(SOTA)方法具有显著的有效性与鲁棒性。