Temporal bipartite graphs are widely used to denote time-evolving relationships between two disjoint sets of nodes, such as customer-product interactions in E-commerce and user-group memberships in social networks. Temporal butterflies, $(2,2)$-bicliques that occur within a short period and in a prescribed order, are essential in modeling the structural and sequential patterns of such graphs. Counting the number of temporal butterflies is thus a fundamental task in analyzing temporal bipartite graphs. However, existing algorithms for butterfly counting on static bipartite graphs and motif counting on temporal unipartite graphs are inefficient for this purpose. In this paper, we present a general framework with three sampling strategies for temporal butterfly counting. Since exact counting can be time-consuming on large graphs, our approach alternatively computes approximate estimates accurately and efficiently. We also provide analytical bounds on the number of samples each strategy requires to obtain estimates with small relative errors and high probability. We finally evaluate our framework on six real-world datasets and demonstrate its superior accuracy and efficiency compared to several baselines. Overall, our proposed framework and sampling strategies provide efficient and accurate approaches to approximating temporal butterfly counts on large-scale temporal bipartite graphs.
翻译:时序二分图广泛用于表示两个不相交节点集之间随时间演化的关系,例如电商中的客户-产品交互以及社交网络中的用户-群体成员关系。时序蝴蝶,即短时间内按特定顺序发生的(2,2)-双团结构,对于建模此类图的结构和序列模式至关重要。因此,计数时序蝴蝶数量是分析时序二分图的基本任务。然而,现有针对静态二分图的蝴蝶计数算法和针对时序单分图的模体计数算法在此任务中效率低下。本文提出一个包含三种采样策略的通用框架用于时序蝴蝶计数。由于在大规模图上进行精确计数可能耗时,我们的方法通过近似计算高效准确地获得估计值。我们还提供了每种策略所需样本数量的分析界,以确保估计结果以高概率保持较小的相对误差。最后,我们在六个真实数据集上评估了该框架,并证明其相较于多种基线方法具有更优的准确性和效率。总体而言,本文提出的框架和采样策略为近似计算大规模时序二分图上的蝴蝶计数提供了高效且准确的解决方案。