Research on the theoretical expressiveness of Graph Neural Networks (GNNs) has developed rapidly, and many methods have been proposed to enhance the expressiveness. However, most methods do not have a uniform expressiveness measure except for a few that strictly follow the $k$-dimensional Weisfeiler-Lehman ($k$-WL) test hierarchy. Their theoretical analyses are often limited to distinguishing certain families of non-isomorphic graphs, leading to difficulties in quantitatively comparing their expressiveness. In contrast to theoretical analysis, another way to measure expressiveness is by evaluating model performance on certain datasets containing 1-WL-indistinguishable graphs. Previous datasets specifically designed for this purpose, however, face problems with difficulty (any model surpassing 1-WL has nearly 100% accuracy), granularity (models tend to be either 100% correct or near random guess), and scale (only a few essentially different graphs in each dataset). To address these limitations, we propose a new expressiveness dataset, $\textbf{BREC}$, which includes 400 pairs of non-isomorphic graphs carefully selected from four primary categories (Basic, Regular, Extension, and CFI). These graphs have higher difficulty (up to 4-WL-indistinguishable), finer granularity (able to compare models between 1-WL and 3-WL), and a larger scale (400 pairs). Further, we synthetically test 16 models with higher-than-1-WL expressiveness on our BREC dataset. Our experiment gives the first thorough comparison of the expressiveness of those state-of-the-art beyond-1-WL GNN models. We expect this dataset to serve as a benchmark for testing the expressiveness of future GNNs. Our dataset and evaluation code are released at: https://github.com/GraphPKU/BREC.
翻译:图神经网络(GNN)的理论表达能力研究发展迅速,目前已提出多种增强表达能力的方法。然而,除严格遵循$k$维Weisfeiler-Lehman($k$-WL)测试层次结构的少数模型外,大多数方法缺乏统一的表达能力度量标准。其理论分析通常局限于区分特定类别的非同构图,导致难以定量比较各模型的表达能力。与理论分析不同,另一种度量表达能力的途径是通过评估模型在包含1-WL不可区分图的数据集上的性能表现。但面向此目的设计的既有数据集存在以下问题:难度过高(任何超越1-WL的模型均可实现近乎100%的准确率)、粒度粗糙(模型表现要么完全正确,要么接近随机猜测)以及规模不足(每个数据集中仅有少量本质不同的图)。为克服这些局限性,我们提出新型表达能力数据集——$\textbf{BREC}$,其包含从四大类别(基础类、正则类、扩展类与CFI类)中精心筛选的400对非同构图。这些图具有更高的区分难度(最高可达4-WL不可区分性)、更精细的粒度(能够比较1-WL至3-WL之间的模型)以及更大的规模(400对)。进一步地,我们在BREC数据集上对16种具备超越1-WL表达能力的模型进行了系统性测试。本实验首次全面比较了这些最先进超越1-WL的GNN模型的表达能力。我们期望该数据集能成为测试未来GNN表达能力的基准。数据集与评估代码已开源至:https://github.com/GraphPKU/BREC。