Subgraph counting is the problem of counting the occurrences of a given query graph in a large target graph. Large-scale subgraph counting is useful in various domains, such as motif counting for social network analysis and loop counting for money laundering detection on transaction networks. Recently, to address the exponential runtime complexity of scalable subgraph counting, neural methods are proposed. However, existing neural counting approaches fall short in three aspects. Firstly, the counts of the same query can vary from zero to millions on different target graphs, posing a much larger challenge than most graph regression tasks. Secondly, current scalable graph neural networks have limited expressive power and fail to efficiently distinguish graphs in count prediction. Furthermore, existing neural approaches cannot predict the occurrence position of queries in the target graph. Here we design DeSCo, a scalable neural deep subgraph counting pipeline, which aims to accurately predict the query count and occurrence position on any target graph after one-time training. Firstly, DeSCo uses a novel canonical partition and divides the large target graph into small neighborhood graphs. The technique greatly reduces the count variation while guaranteeing no missing or double-counting. Secondly, neighborhood counting uses an expressive subgraph-based heterogeneous graph neural network to accurately perform counting in each neighborhood. Finally, gossip propagation propagates neighborhood counts with learnable gates to harness the inductive biases of motif counts. DeSCo is evaluated on eight real-world datasets from various domains. It outperforms state-of-the-art neural methods with 137x improvement in the mean squared error of count prediction, while maintaining the polynomial runtime complexity.
翻译:子图计数是统计给定查询图在大型目标图中出现次数的问题。大规模子图计数在多个领域具有重要应用,例如社交网络分析中的模体计数,以及交易网络中用于洗钱检测的环计数。近年来,为应对可扩展子图计数中的指数级运行时复杂度,神经网络方法被提出。然而,现有的神经计数方法在三个方面存在不足。首先,同一查询的计数值在不同目标图上可能从零到数百万不等,这带来了比大多数图回归任务更大的挑战。其次,当前可扩展图神经网络的表达能力有限,无法有效区分计数预测中的图结构。此外,现有神经方法无法预测查询在目标图中的出现位置。为此,我们设计了DeSCo——一种可扩展的神经深度子图计数管道,旨在经过一次性训练后,能准确预测任意目标图上查询的计数和出现位置。首先,DeSCo采用一种新颖的规范划分方法,将大型目标图分割成小的邻域图。该技术大幅降低了计数变异性,同时保证无遗漏或重复计数。其次,邻域计数利用基于子图的异构表达性图神经网络,在每个邻域内精确执行计数。最后,八卦传播机制通过可学习的门控传播邻域计数,以利用模体计数的归纳偏置。DeSCo在来自不同领域的八个真实数据集上进行了评估。其在计数预测的均方误差上超越了现有最先进的神经方法,提升了137倍,同时保持了多项式运行时复杂度。