The expressive power of graph neural networks is usually measured by comparing how many pairs of graphs or nodes an architecture can possibly distinguish as non-isomorphic to those distinguishable by the $k$-dimensional Weisfeiler-Lehman ($k$-WL) test. In this paper, we uncover misalignments between graph machine learning practitioners' conceptualizations of expressive power and $k$-WL through a systematic analysis of the reliability and validity of $k$-WL. We conduct a survey ($n = 18$) of practitioners to surface their conceptualizations of expressive power and their assumptions about $k$-WL. In contrast to practitioners' opinions, our analysis (which draws from graph theory and benchmark auditing) reveals that $k$-WL does not guarantee isometry, can be irrelevant to real-world graph tasks, and may not promote generalization or trustworthiness. We argue for extensional definitions and measurement of expressive power based on benchmarks. We further contribute guiding questions for constructing such benchmarks, which is critical for graph machine learning practitioners to develop and transparently communicate our understandings of expressive power.
翻译:图神经网络的表达能力通常通过比较其能够区分的非同构图或节点对的数量与k维魏斯费勒-莱曼(k-WL)检验所区分的数量来衡量。本文通过对k-WL信度和效度的系统分析,揭示了图机器学习从业者对表达能力的概念化认知与k-WL之间的错位。我们开展了一项针对从业者(n=18)的调查研究,梳理其对表达能力的概念化理解以及对k-WL的假设。与从业者观点相反,我们的分析(基于图论与基准审计)表明:k-WL无法保证等距性,可能与现实图任务无关,且未必能促进泛化性或可信性。我们主张基于基准测试对表达能力进行外延定义与测量,并进一步提出构建此类基准测试的指导性问题,这对于图机器学习从业者发展和透明沟通我们对表达能力理解至关重要。