GraphLand: Evaluating Graph Machine Learning Models on Diverse Industrial Data

Although data that can be naturally represented as graphs is widespread in real-world applications across diverse industries, popular graph ML benchmarks for node property prediction only cover a surprisingly narrow set of data domains, and graph neural networks (GNNs) are often evaluated on just a few academic citation networks. This issue is particularly pressing in light of the recent growing interest in designing graph foundation models. These models are supposed to be able to transfer to diverse graph datasets from different domains, and yet the proposed graph foundation models are often evaluated on a very limited set of datasets from narrow applications. To alleviate this issue, we introduce GraphLand: a benchmark of 14 diverse graph datasets for node property prediction from a range of different industrial applications. GraphLand allows evaluating graph ML models on a wide range of graphs with diverse sizes, structural characteristics, and feature sets, all in a unified setting. Further, GraphLand allows investigating such previously underexplored research questions as how realistic temporal distributional shifts under transductive and inductive settings influence graph ML model performance. To mimic realistic industrial settings, we use GraphLand to compare GNNs with gradient-boosted decision trees (GBDT) models that are popular in industrial applications and show that GBDTs provided with additional graph-based input features can sometimes be very strong baselines. Further, we evaluate currently available general-purpose graph foundation models and find that they fail to produce competitive results on our proposed datasets.

翻译：尽管在现实世界的跨行业应用中，能够自然表示为图的数据广泛存在，但流行的节点属性预测图机器学习基准仅覆盖了令人惊讶的狭窄数据领域，图神经网络（GNNs）通常仅在少数几个学术引文网络上进行评估。鉴于近期对设计图基础模型日益增长的兴趣，这一问题尤为紧迫。这些模型本应能够迁移到来自不同领域的多样化图数据集，然而所提出的图基础模型通常仅在来自狭窄应用的非常有限的数据集上进行评估。为缓解这一问题，我们引入了GraphLand：一个包含来自一系列不同工业应用的14个多样化图数据集的节点属性预测基准。GraphLand允许在统一的设置下，在具有多样化规模、结构特征和特征集的广泛图谱上评估图机器学习模型。此外，GraphLand使得研究诸如在转导和归纳设置下，现实的时间分布偏移如何影响图机器学习模型性能这类先前未被充分探索的研究问题成为可能。为了模拟现实的工业设置，我们使用GraphLand将GNNs与在工业应用中流行的梯度提升决策树（GBDT）模型进行比较，结果表明，提供额外基于图的输入特征的GBDT有时可以成为非常强的基线。此外，我们评估了当前可用的通用图基础模型，发现它们在我们提出的数据集上未能产生有竞争力的结果。