Accurate and Fast Estimation of Temporal Motifs using Path Sampling

Counting the number of small subgraphs, called motifs, is a fundamental problem in social network analysis and graph mining. Many real-world networks are directed and temporal, where edges have timestamps. Motif counting in directed, temporal graphs is especially challenging because there are a plethora of different kinds of patterns. Temporal motif counts reveal much richer information and there is a need for scalable algorithms for motif counting. A major challenge in counting is that there can be trillions of temporal motif matches even with a graph with only millions of vertices. Both the motifs and the input graphs can have multiple edges between two vertices, leading to a combinatorial explosion problem. Counting temporal motifs involving just four vertices is not feasible with current state-of-the-art algorithms. We design an algorithm, TEACUPS, that addresses this problem using a novel technique of temporal path sampling. We combine a path sampling method with carefully designed temporal data structures, to propose an efficient approximate algorithm for temporal motif counting. TEACUPS is an unbiased estimator with provable concentration behavior, which can be used to bound the estimation error. For a Bitcoin graph with hundreds of millions of edges, TEACUPS runs in less than 1 minute, while the exact counting algorithm takes more than a day. We empirically demonstrate the accuracy of TEACUPS on large datasets, showing an average of 30$\times$ speedup (up to 2000$\times$ speedup) compared to existing GPU-based exact counting methods while preserving high count estimation accuracy.

翻译：在社交网络分析与图挖掘中，对小规模子图（称为模体）进行计数是一个基础性问题。许多现实世界网络具有方向性与时序性，即边附有时间戳。在有时序性的有向图中进行模体计数尤为困难，因为存在大量不同类型的模式。时序模体计数能揭示更为丰富的信息，因此需要可扩展的计数算法。计数过程中的主要挑战在于，即使是在仅包含数百万顶点的图中，也可能存在数万亿个时序模体匹配。模体与输入图中两个顶点之间均可存在多条边，这导致了组合爆炸问题。当前最先进的算法尚无法对仅涉及四个顶点的时序模体进行计数。我们设计了一种名为TEACUPS的算法，该算法通过一种新颖的时序路径采样技术来解决此问题。我们将路径采样方法与精心设计的时序数据结构相结合，提出了一种高效的时序模体近似计数算法。TEACUPS是一种无偏估计器，具有可证明的集中性行为，可用于界定估计误差。对于一个包含数亿条边的比特币交易图，TEACUPS的运行时间少于1分钟，而精确计数算法则需要超过一天。我们在多个大型数据集上实证验证了TEACUPS的准确性，结果表明，在保持高计数估计精度的同时，相较于现有的基于GPU的精确计数方法，TEACUPS平均实现了30倍（最高可达2000倍）的加速。