Hi-GMAE: Hierarchical Graph Masked Autoencoders

Graph Masked Autoencoders (GMAEs) have emerged as a notable self-supervised learning approach for graph-structured data. Existing GMAE models primarily focus on reconstructing node-level information, categorizing them as single-scale GMAEs. This methodology, while effective in certain contexts, tends to overlook the complex hierarchical structures inherent in many real-world graphs. For instance, molecular graphs exhibit a clear hierarchical organization in the form of the atoms-functional groups-molecules structure. Hence, the inability of single-scale GMAE models to incorporate these hierarchical relationships often leads to their inadequate capture of crucial high-level graph information, resulting in a noticeable decline in performance. To address this limitation, we propose Hierarchical Graph Masked AutoEncoders (Hi-GMAE), a novel multi-scale GMAE framework designed to handle the hierarchical structures within graphs. First, Hi-GMAE constructs a multi-scale graph hierarchy through graph pooling, enabling the exploration of graph structures across different granularity levels. To ensure masking uniformity of subgraphs across these scales, we propose a novel coarse-to-fine strategy that initiates masking at the coarsest scale and progressively back-projects the mask to the finer scales. Furthermore, we integrate a gradual recovery strategy with the masking process to mitigate the learning challenges posed by completely masked subgraphs. Diverging from the standard graph neural network (GNN) used in GMAE models, Hi-GMAE modifies its encoder and decoder into hierarchical structures. This entails using GNN at the finer scales for detailed local graph analysis and employing a graph transformer at coarser scales to capture global information. Our experiments on 15 graph datasets consistently demonstrate that Hi-GMAE outperforms 17 state-of-the-art self-supervised competitors.

翻译：图掩码自编码器（GMAE）已成为图结构数据中一种显著的自监督学习方法。现有GMAE模型主要聚焦于重建节点级信息，可归类为单尺度GMAE。该方法虽在特定场景下有效，但往往忽略了许多现实图数据中固有的复杂层级结构。例如，分子图呈现明确的"原子-官能团-分子"层级组织。因此，单尺度GMAE模型无法融合这些层级关系，导致其难以充分捕获关键的高层图信息，进而造成性能显著下降。为解决这一局限，我们提出层级图掩码自编码器（Hi-GMAE），这是一种专为处理图中层级结构设计的新型多尺度GMAE框架。首先，Hi-GMAE通过图池化构建多尺度图层级结构，从而在不同粒度层级探索图结构。为确保跨尺度子图掩码的一致性，我们提出一种新颖的粗到细策略：从最粗尺度启动掩码，并逐步将掩码反向投影至更细尺度。此外，我们将渐进式恢复策略融入掩码过程，以缓解完全掩码子图带来的学习挑战。不同于GMAE模型中使用的标准图神经网络（GNN），Hi-GMAE将其编码器与解码器改造为层级结构：在精细尺度使用GNN进行局部图细节分析，在粗糙尺度采用图变换器捕获全局信息。我们在15个图数据集上的实验一致表明，Hi-GMAE的性能优于17个当前最先进的自监督对比方法。