Graph Masked Autoencoders (GMAEs) have emerged as a notable self-supervised learning approach for graph-structured data. Existing GMAE models primarily focus on reconstructing node-level information, categorizing them as single-scale GMAEs. This methodology, while effective in certain contexts, tends to overlook the complex hierarchical structures inherent in many real-world graphs. For instance, molecular graphs exhibit a clear hierarchical organization in the form of the atoms-functional groups-molecules structure. Hence, the inability of single-scale GMAE models to incorporate these hierarchical relationships often leads to their inadequate capture of crucial high-level graph information, resulting in a noticeable decline in performance. To address this limitation, we propose Hierarchical Graph Masked AutoEncoders (Hi-GMAE), a novel multi-scale GMAE framework designed to handle the hierarchical structures within graphs. First, Hi-GMAE constructs a multi-scale graph hierarchy through graph pooling, enabling the exploration of graph structures across different granularity levels. To ensure masking uniformity of subgraphs across these scales, we propose a novel coarse-to-fine strategy that initiates masking at the coarsest scale and progressively back-projects the mask to the finer scales. Furthermore, we integrate a gradual recovery strategy with the masking process to mitigate the learning challenges posed by completely masked subgraphs. Diverging from the standard graph neural network (GNN) used in GMAE models, Hi-GMAE modifies its encoder and decoder into hierarchical structures. This entails using GNN at the finer scales for detailed local graph analysis and employing a graph transformer at coarser scales to capture global information. Our experiments on 15 graph datasets consistently demonstrate that Hi-GMAE outperforms 17 state-of-the-art self-supervised competitors.
翻译:图掩码自编码器(GMAE)已成为图结构数据中一种显著的自监督学习方法。现有GMAE模型主要聚焦于重建节点级信息,可归类为单尺度GMAE。该方法虽在特定场景下有效,但往往忽略了许多现实图数据中固有的复杂层级结构。例如,分子图呈现明确的"原子-官能团-分子"层级组织。因此,单尺度GMAE模型无法融合这些层级关系,导致其难以充分捕获关键的高层图信息,进而造成性能显著下降。为解决这一局限,我们提出层级图掩码自编码器(Hi-GMAE),这是一种专为处理图中层级结构设计的新型多尺度GMAE框架。首先,Hi-GMAE通过图池化构建多尺度图层级结构,从而在不同粒度层级探索图结构。为确保跨尺度子图掩码的一致性,我们提出一种新颖的粗到细策略:从最粗尺度启动掩码,并逐步将掩码反向投影至更细尺度。此外,我们将渐进式恢复策略融入掩码过程,以缓解完全掩码子图带来的学习挑战。不同于GMAE模型中使用的标准图神经网络(GNN),Hi-GMAE将其编码器与解码器改造为层级结构:在精细尺度使用GNN进行局部图细节分析,在粗糙尺度采用图变换器捕获全局信息。我们在15个图数据集上的实验一致表明,Hi-GMAE的性能优于17个当前最先进的自监督对比方法。