Machine learning on graphs has made substantial progress across domains such as molecular property prediction and chip design. Yet benchmarking practices remain fragmented, often relying on narrow, task-specific datasets and inconsistent evaluation protocols, hindering reproducibility and broader progress. With the recent popularity of graph foundation models, these weaknesses have become apparent, as existing benchmarks are insufficient for thorough evaluation. To address these challenges, we introduce GraphBench, a comprehensive benchmark suite spanning diverse real-world domains and task settings, including node-level, edge-level, graph-level, and generative tasks. GraphBench provides standardized evaluation protocols, including consistent dataset splits and metrics for assessing out-of-distribution generalization across selected tasks, as well as a unified hyperparameter-tuning framework. We further evaluate GraphBench with recent message-passing neural networks and graph transformer models, establishing principled baselines for future research. See www.graphbench.io for further details.
翻译:基于图的机器学习已在分子性质预测和芯片设计等领域取得了显著进展。然而,基准测试实践仍较为零散,通常依赖于狭窄的、特定任务的数据集和不一致的评估协议,这阻碍了可复现性和更广泛的进步。随着近期图基础模型的普及,这些弱点愈发明显,因为现有基准不足以进行全面评估。为解决这些挑战,我们提出了GraphBench,一个涵盖多种真实世界领域和任务设置的综合基准套件,包括节点级、边级、图级和生成任务。GraphBench提供标准化的评估协议,包括一致的数据集划分和用于评估选定任务上分布外泛化的指标,以及一个统一的超参数调优框架。我们进一步使用最新的消息传递神经网络和图变换器模型对GraphBench进行评估,为未来研究建立了原则性的基线。更多详情请参见www.graphbench.io。