The recent Long-Range Graph Benchmark (LRGB, Dwivedi et al. 2022) introduced a set of graph learning tasks strongly dependent on long-range interaction between vertices. Empirical evidence suggests that on these tasks Graph Transformers significantly outperform Message Passing GNNs (MPGNNs). In this paper, we carefully reevaluate multiple MPGNN baselines as well as the Graph Transformer GPS (Ramp\'a\v{s}ek et al. 2022) on LRGB. Through a rigorous empirical analysis, we demonstrate that the reported performance gap is overestimated due to suboptimal hyperparameter choices. It is noteworthy that across multiple datasets the performance gap completely vanishes after basic hyperparameter optimization. In addition, we discuss the impact of lacking feature normalization for LRGB's vision datasets and highlight a spurious implementation of LRGB's link prediction metric. The principal aim of our paper is to establish a higher standard of empirical rigor within the graph machine learning community.
翻译:近期提出的长程图基准(Long-Range Graph Benchmark, LRGB,Dwivedi等人,2022)引入了一系列高度依赖顶点间长程交互的图学习任务。实验证据表明,在这些任务上,图Transformer显著优于消息传递图神经网络(MPGNN)。本文在LRGB上仔细重新评估了多个MPGNN基线以及图Transformer模型GPS(Rampášek等人,2022)。通过严格的实证分析,我们证明先前报道的性能差距因超参数选择欠佳而被高估。值得注意的是,在多个数据集上,经过基本的超参数优化后,性能差距完全消失。此外,我们讨论了LRGB视觉数据集缺乏特征归一化的影响,并指出LRGB链路预测指标实现中的伪象。本文的主要目的是为图机器学习社区建立更高的实证严谨性标准。