OpenGraph: Towards Open Graph Foundation Models

Graph learning has become indispensable for interpreting and harnessing relational data in diverse fields, ranging from recommendation systems to social network analysis. In this context, a variety of GNNs have emerged as promising methodologies for encoding the structural information of graphs. By effectively capturing the graph's underlying structure, these GNNs have shown great potential in enhancing performance in graph learning tasks, such as link prediction and node classification. However, despite their successes, a significant challenge persists: these advanced methods often face difficulties in generalizing to unseen graph data that significantly differs from the training instances. In this work, our aim is to advance the graph learning paradigm by developing a general graph foundation model. This model is designed to understand the complex topological patterns present in diverse graph data, enabling it to excel in zero-shot graph learning tasks across different downstream datasets. To achieve this goal, we address several key technical challenges in our OpenGraph model. Firstly, we propose a unified graph tokenizer to adapt our graph model to generalize well on unseen graph data, even when the underlying graph properties differ significantly from those encountered during training. Secondly, we develop a scalable graph transformer as the foundational encoder, which effectively captures node-wise dependencies within the global topological context. Thirdly, we introduce a data augmentation mechanism enhanced by a LLM to alleviate the limitations of data scarcity in real-world scenarios. Extensive experiments validate the effectiveness of our framework. By adapting our OpenGraph to new graph characteristics and comprehending the nuances of diverse graphs, our approach achieves remarkable zero-shot graph learning performance across various settings and domains.

翻译：图学习已成为从推荐系统到社交网络分析等多个领域中解释和利用关系型数据不可或缺的工具。在此背景下，多种图神经网络（GNN）作为编码图结构信息的有效方法应运而生。通过有效捕获图的底层结构，这些GNN在提升图学习任务（如链接预测和节点分类）性能方面展现出巨大潜力。然而，尽管取得了成功，一个重大挑战依然存在：这些先进方法往往难以泛化到与训练实例显著不同的未见图数据上。本研究旨在通过开发通用图基础模型来推进图学习范式。该模型旨在理解不同图数据中存在的复杂拓扑模式，从而使其能够在不同下游数据集上胜任零样本图学习任务。为实现这一目标，我们在OpenGraph模型中攻克了若干关键技术难题。首先，我们提出了一种统一图分词器，使图模型能够良好泛化到未见图数据上，即使底层图属性与训练时遇到的属性存在显著差异。其次，我们开发了一种可扩展的图Transformer作为基础编码器，有效捕捉全局拓扑上下文中节点间的依赖关系。第三，我们引入了一种由大语言模型增强的数据增强机制，以缓解现实场景中数据稀缺的限制。大量实验验证了我们框架的有效性。通过使OpenGraph适应新的图特征并理解多样图的细微差异，我们的方法在各种场景和领域下均实现了卓越的零样本图学习性能。