In recent years, the rapid development of foundation models and graph pre-training technologies has spurred increasing interest in constructing a universal pre-trained graph model or Graph Foundation Model (GFM). However, a significant challenge is that existing models are unable to address feature heterogeneity in graph data without textual information, which hinders the transferability of graph models across different datasets. To bridge this gap, we propose the concept of learnable graph patches, which we regard as the smallest semantic units of any graph data. We decompose the graph into learnable graph patches by unfolding the node features and constructing corresponding patch structures separately. We then design a framework that mines transferable information from graph data across domains. Specifically, after extracting graph patches, we propose a patch encoder to extract knowledge from each unit and a patch aggregator to learn how the units are combined into a whole. Due to its domain-agnostic nature, the model can be applied to downstream data across different domains. Furthermore, we analyze the connection between our method and existing graph models, as well as the transferability of the node embeddings it generates. Empirically, our method not only achieves the capability to use multi-domain graphs for pre-training, but also shows enhanced performance across various downstream datasets and tasks. Moreover, we observe consistent improvement in downstream performance as the volume of pre-training data increases.
翻译:近年来,基础模型与图预训练技术的快速发展,激发了学术界对构建通用预训练图模型(即图基础模型,GFM)的浓厚兴趣。然而,现有模型无法处理不含文本信息的图数据中的特征异质性,这严重制约了图模型在不同数据集间的可迁移性。为填补这一空白,我们提出可学习图块的概念,将其视为任意图数据的最小语义单元。通过展开节点特征并分别构建对应图块结构,我们将图分解为可学习图块,进而设计出能跨域挖掘图数据可迁移信息的框架。具体而言,在提取图块后,我们提出补丁编码器提取每个单元的知识,以及补丁聚合器学习各单元如何整合为整体。由于该框架具有领域无关性,因此可应用于跨不同领域的下游数据。此外,我们分析了本方法与现有图模型之间的关联,以及所生成节点嵌入的可迁移性。实验证明,本方法不仅实现了多领域图数据进行预训练的能力,还在多种下游数据集与任务中展现出更优性能。值得注意的是,随着预训练数据量的增加,下游任务性能呈现持续提升趋势。