GraphPFN: A Prior-Data Fitted Graph Foundation Model

Graph foundation models face several fundamental challenges including transferability across datasets and data scarcity, which calls into question the very feasibility of graph foundation models. However, despite similar challenges, the tabular domain has recently witnessed the emergence of the first successful foundation models such as TabPFNv2 and LimiX. Many of these models are based on the prior-data fitted networks (PFN) framework, in which models are pretrained on carefully designed synthetic datasets to make predictions in an in-context learning setting. Recently, G2T-FM has made the first step towards adopting PFNs for graphs, yet it is limited to hand-crafted features and was never pretrained on graph data. In this work, we make the next step by proposing GraphPFN, a PFN-based model designed and pretrained specifically for graph node-level tasks. Following the PFN framework, we first design a prior distribution of synthetic attributed graphs by using a novel combination of multi-level stochastic block models and a preferential attachment process for structure generation and graph-aware structured causal models for attribute generation. Then, we augment the tabular foundation model LimiX with attention-based graph neighborhood aggregation layers and train it on synthetic graphs sampled from our prior. On diverse real-world graph datasets with node-level tasks, GraphPFN shows strong in-context learning performance and achieves state-of-the-art results after finetuning, outperforming both G2T-FM and task-specific GNNs trained from scratch on most datasets. More broadly, GraphPFN shows the potential of PFN-based models for building graph foundation models.

翻译：图基础模型面临若干根本性挑战，包括跨数据集的可迁移性和数据稀缺性，这使图基础模型本身的可行性受到质疑。然而，尽管存在类似挑战，表格领域近期已见证了首批成功基础模型的出现，例如TabPFNv2和LimiX。这些模型大多基于先验数据拟合网络（PFN）框架，该框架通过在精心设计的合成数据集上进行预训练，使模型能够在上下文学习环境中进行预测。近期，G2T-FM在将PFN应用于图数据方面迈出了第一步，但其仍局限于手工特征且从未在图数据上进行预训练。本工作中，我们通过提出GraphPFN迈出了下一步，这是一种专门为图节点级任务设计和预训练的基于PFN的模型。遵循PFN框架，我们首先设计了一个合成属性图的先验分布，其采用多级随机块模型与偏好依附过程相结合的新颖方法生成图结构，并利用图感知结构化因果模型生成属性。随后，我们在表格基础模型LimiX中引入基于注意力的图邻域聚合层，并在从该先验分布采样的合成图上进行训练。在涉及节点级任务的多样化真实世界图数据集上，GraphPFN展现出强大的上下文学习性能，并在微调后取得了最先进的结果，在多数数据集上超越了G2T-FM以及从头训练的任务特定图神经网络。更广泛而言，GraphPFN展示了基于PFN的模型在构建图基础模型方面的潜力。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

图基础模型中的分布外（Out-of-Distribution）泛化研究

专知会员服务

12+阅读 · 1月31日

图基础模型：全面综述

专知会员服务

38+阅读 · 2025年5月22日

知识图谱基础模型的数学基础

专知会员服务

41+阅读 · 2025年1月12日

大模型在图上怎么做？北邮等最新《图基础模型》综述，详述GFMs关键技术

专知会员服务

60+阅读 · 2023年10月19日