Graphs are fundamental data structures for modeling complex interactions in domains such as social networks, molecular structures, and biological systems. Graph-level tasks, which involve predicting properties or labels for entire graphs, are crucial for applications like molecular property prediction and subgraph counting. While Graph Neural Networks (GNNs) have shown significant promise for these tasks, their evaluations are often limited by narrow datasets, task coverage, and inconsistent experimental setups, hindering their generalizability. In this paper, we present a comprehensive experimental study of GNNs on graph-level tasks, systematically categorizing them into five types: node-based, hierarchical pooling-based, subgraph-based, graph learning-based, and self-supervised learning-based GNNs. To address these challenges, we propose a unified evaluation framework OpenGLT for graph-level GNNs. OpenGLT standardizes the evaluation process across diverse datasets, multiple graph tasks (e.g., classification and regression), and real-world scenarios, including noisy, imbalanced, and few-shot graphs. Extensive experiments are conducted on 16 baseline models across five categories, evaluated on 13 graph classification and 13 graph regression datasets. These experiments provide comprehensive insights into the strengths and weaknesses of existing GNN architectures.
翻译:图是用于建模社交网络、分子结构和生物系统等领域中复杂交互的基本数据结构。图级任务涉及对整个图的属性或标签进行预测,对于分子性质预测和子图计数等应用至关重要。尽管图神经网络(GNNs)在这些任务中展现出显著潜力,但其评估常受限于狭窄的数据集、有限的任务覆盖范围和不一致的实验设置,从而阻碍了其泛化能力。本文对图级任务中的GNNs进行了全面的实验研究,将其系统性地划分为五类:基于节点、基于层次化池化、基于子图、基于图学习以及基于自监督学习的GNNs。为应对这些挑战,我们提出了面向图级GNN的统一评估框架OpenGLT。该框架在多样化数据集、多重图任务(如分类与回归)及现实场景(包括含噪声、不平衡和少样本图)中实现了评估流程的标准化。我们在五类架构的16个基线模型上进行了大规模实验,评估覆盖13个图分类数据集和13个图回归数据集。这些实验为现有GNN架构的优势与局限性提供了全面深入的洞见。