Graphs are fundamental data structures for modeling complex interactions in domains such as social networks, molecular structures, and biological systems. Graph-level tasks, which involve predicting properties or labels for entire graphs, are crucial for applications like molecular property prediction and subgraph counting. While Graph Neural Networks (GNNs) have shown significant promise for these tasks, their evaluations are often limited by narrow datasets, insufficient architecture coverage, restricted task scope and scenarios, and inconsistent experimental setups, making it difficult to draw reliable conclusions across domains. In this paper, we present a comprehensive experimental study of GNNs on graph-level tasks, systematically categorizing them into five types: node-based, hierarchical pooling-based, subgraph-based, graph learning-based, and self-supervised learning-based GNNs. We propose a unified evaluation framework OpenGLT, which standardizes evaluation across four domains (social networks, biology, chemistry, and motif counting), two task types (classification and regression), and three real-world scenarios (clean, noisy, imbalanced, and few-shot graphs). Extensive experiments on 20 models across 26 classification and regression datasets reveal that: (i) no single architecture dominates both effectiveness and efficiency universally, i.e., subgraph-based GNNs excel in expressiveness, graph learning-based and SSL-based methods in robustness, and node-based and pooling-based models in efficiency; and (ii) specific graph topological features such as density and centrality can partially guide the selection of suitable GNN architectures for different graph characteristics.
翻译:图是建模社交网络、分子结构及生物系统等复杂交互关系的基础数据结构。图级任务涉及预测整个图的属性或标签,在分子性质预测和子图计数等应用中至关重要。尽管图神经网络在此类任务中展现出显著潜力,但其评估常受限于数据集范围狭窄、架构覆盖不足、任务场景有限以及实验设置不一致等问题,导致难以跨领域获得可靠结论。本文对图级任务中的图神经网络进行了系统性实验研究,将其划分为五类:基于节点的、基于层次池化的、基于子图的、基于图学习的及基于自监督学习的图神经网络。我们提出统一评估框架OpenGLT,标准化了四个领域(社交网络、生物学、化学与模体计数)、两种任务类型(分类与回归)及三种真实场景(纯净图、噪声图、不平衡图与少样本图)的评估流程。基于20个模型在26个分类与回归数据集上的广泛实验揭示:(i) 不存在同时在有效性与效率上普适最优的单一架构——基于子图的图神经网络擅长表达能力,基于图学习与自监督的方法在鲁棒性上占优,而基于节点与池化的模型则具备效率优势;(ii) 密度、中心性等特定图拓扑特征可部分指导针对不同图特性选择适配的图神经网络架构。