Boosting the Cycle Counting Power of Graph Neural Networks with I$^2$-GNNs

Message Passing Neural Networks (MPNNs) are a widely used class of Graph Neural Networks (GNNs). The limited representational power of MPNNs inspires the study of provably powerful GNN architectures. However, knowing one model is more powerful than another gives little insight about what functions they can or cannot express. It is still unclear whether these models are able to approximate specific functions such as counting certain graph substructures, which is essential for applications in biology, chemistry and social network analysis. Motivated by this, we propose to study the counting power of Subgraph MPNNs, a recent and popular class of powerful GNN models that extract rooted subgraphs for each node, assign the root node a unique identifier and encode the root node's representation within its rooted subgraph. Specifically, we prove that Subgraph MPNNs fail to count more-than-4-cycles at node level, implying that node representations cannot correctly encode the surrounding substructures like ring systems with more than four atoms. To overcome this limitation, we propose I$^2$-GNNs to extend Subgraph MPNNs by assigning different identifiers for the root node and its neighbors in each subgraph. I$^2$-GNNs' discriminative power is shown to be strictly stronger than Subgraph MPNNs and partially stronger than the 3-WL test. More importantly, I$^2$-GNNs are proven capable of counting all 3, 4, 5 and 6-cycles, covering common substructures like benzene rings in organic chemistry, while still keeping linear complexity. To the best of our knowledge, it is the first linear-time GNN model that can count 6-cycles with theoretical guarantees. We validate its counting power in cycle counting tasks and demonstrate its competitive performance in molecular prediction benchmarks.

翻译：消息传递神经网络（MPNNs）是图神经网络（GNNs）中广泛应用的一类模型。MPNNs有限的表达能力激发了对可证明强大GNN架构的研究。然而，了解某个模型比另一个模型更强大，并不能为它们能够或不能表达哪些函数提供多少见解。目前尚不清楚这些模型是否能够逼近特定函数，例如计算某些图子结构（如环的数量），这对于生物学、化学和社交网络分析等应用至关重要。受此启发，我们提出研究子图MPNNs的计数能力——这是一类近年来流行且强大的GNN模型，能够为每个节点提取根子图，为根节点分配唯一标识符，并在其根子图内编码根节点的表示。具体来说，我们证明了子图MPNNs无法在节点级别计数超过4个节点的环，这意味着节点表示无法正确处理诸如包含四个以上原子的环系统等周围子结构。为克服这一局限，我们提出I²-GNN，通过在每个子图中为根节点及其邻居分配不同标识符来扩展子图MPNNs。研究表明，I²-GNNs的区分能力严格强于子图MPNNs，且部分强于3-WL测试。更重要的是，I²-GNNs被证明能够计数所有3、4、5和6元环，覆盖有机化学中常见的苯环等子结构，同时保持线性复杂度。据我们所知，这是首个具备理论保证且能在线性时间内计数6元环的GNN模型。我们在环计数任务中验证了其计数能力，并在分子预测基准测试中展示了其竞争性表现。