The ability of graph neural networks (GNNs) to count certain graph substructures, especially cycles, is important for the success of GNNs on a wide range of tasks. It has been recently used as a popular metric for evaluating the expressive power of GNNs. Many of the proposed GNN models with provable cycle counting power are based on subgraph GNNs, i.e., extracting a bag of subgraphs from the input graph, generating representations for each subgraph, and using them to augment the representation of the input graph. However, those methods require heavy preprocessing, and suffer from high time and memory costs. In this paper, we overcome the aforementioned limitations of subgraph GNNs by proposing a novel class of GNNs -- $d$-Distance-Restricted FWL(2) GNNs, or $d$-DRFWL(2) GNNs. $d$-DRFWL(2) GNNs use node pairs whose mutual distances are at most $d$ as the units for message passing to balance the expressive power and complexity. By performing message passing among distance-restricted node pairs in the original graph, $d$-DRFWL(2) GNNs avoid the expensive subgraph extraction operations in subgraph GNNs, making both the time and space complexity lower. We theoretically show that the discriminative power of $d$-DRFWL(2) GNNs strictly increases as $d$ increases. More importantly, $d$-DRFWL(2) GNNs have provably strong cycle counting power even with $d=2$: they can count all 3, 4, 5, 6-cycles. Since 6-cycles (e.g., benzene rings) are ubiquitous in organic molecules, being able to detect and count them is crucial for achieving robust and generalizable performance on molecular tasks. Experiments on both synthetic datasets and molecular datasets verify our theory. To the best of our knowledge, our model is the most efficient GNN model to date (both theoretically and empirically) that can count up to 6-cycles.
翻译:图神经网络(GNN)计数特定图子结构(尤其是环)的能力对于其在广泛任务上的成功至关重要。近期,该能力已被广泛用作评估GNN表达能力的度量标准。许多具有可证明环计数能力的GNN模型基于子图GNN框架,即从输入图中提取子图集合,生成每个子图的表示,并利用这些表示增强输入图的整体表示。然而,此类方法需要繁重的预处理,且时间和内存开销较高。本文通过提出一类新型GNN模型——$d$距离受限FWL(2) GNN($d$-DRFWL(2) GNN),克服了子图GNN的上述局限。$d$-DRFWL(2) GNN以距离不超过$d$的节点对作为消息传递的基本单元,在表达能力和复杂度之间取得平衡。通过在原始图中距离受限的节点对之间执行消息传递,$d$-DRFWL(2) GNN避免了子图GNN中昂贵的子图提取操作,从而降低了时间和空间复杂度。我们理论证明了$d$-DRFWL(2) GNN的区分能力随$d$增大严格增强。更重要的是,即使$d=2$时,$d$-DRFWL(2) GNN仍具有可证明的强大环计数能力:可计数所有3、4、5、6元环。由于6元环(如苯环)在有机分子中普遍存在,检测并计数此类环对于分子任务的鲁棒性和泛化性能至关重要。在合成数据集和分子数据集上的实验验证了我们的理论。据我们所知,我们的模型是迄今为止(理论上和实验上)最有效的、可计数至6元环的GNN模型。